MFMAIN(1) OGMP SEQUENCE UTILITIES MFMAIN(1) NAME mfmain - automatically annotate OGMP masterfiles SYNOPSIS To annotate tracks from a tracks.res file: mfmain [-B] [tracks.res] To annotate readings frames from a prot.lst file: mfmain [-B] -P masterfile [-g] prot.lst To annotate range or point annotations from a coordinates file: mfmain [-B] -C masterfile coorfile To annotate trna annotations from an OGMP trnascan(1) output file: mfmain [-B] -T masterfile trnascan.output DESCRIPTION mfmain is a general purpose masterfile (mf) maintenance program for adding annotations to mfs. The input for these annotations can come from a variety of sources, but what these sources have in common is the following: 1. the name of the mf; 2. the contig within the mf to be annotated; 3. the offset, or position from the first base in the contig, where the annotation is to be inserted; 4. the annotation itself (text). and in certain cases (ideally all cases): 5. the precise sequence around the insertion point. Together these features guarentee fast, accurate and unambiguous annotion insertion. The inputs to mfmain are explained below. TRACKS mfmain reads a file containing the output of the tracks(1) program (by default, "tracks.res") and attempt to annotate the masterfiles according to the information stored there. That "resfile" is first read once and its validity checked; all errors are fatal, since this file should never be edited by a user and masterfile maintenance is critical. Once the file has been checked, all its records are processed sequentially and annotations are added to the masterfiles. If a match was found on the non-complementary strand, the following annotation will be added: ; clonename (size) **==> where clonename is extracted from the "Track" field of the resfile and size is a number between 0.1 and 9.9 interactively asked to the user by mfmain. If a match was found on the complementary strand, the annotation will be similar except that the arrow will point the other way. Annotations are inserted at the correct nucleotide location of the masterfile block that was matched by tracks(1). PROT.LST FORMATED FILES mfmain will read prot.lst formated files as supplied by flip(1) or protfilt(1). In such instances, annotations will be inserted with identifiers starting with the string "P-" followed by an arbitrary but unique string. These annotations will be in pairs in the form: ; P-aa1026951 ==> and ; end ==> P-aa1026951 for orfs on the non-complementary strand, and in the form of ; end <== P-aa1026951 and ; P-aa1026951 <== for orfs in the complementary strand. These pairs of annotations will surround the orf, at the positions specified in the prot.lst formated file. If the -g option is supplied, this naming convention is ignored and names are assumed to have been supplied manuall by the user in the in the prot.lst files, immediately after the greater-than sign that identifies records and before the contig name. See the OPTIONS section for an example. COORDINATES FORMATED FILES As for prot.lst formated files, mfmain will read coordinates formated files as generated by gb2mfc(1) and mf2stad(1). OGMP SCANTRNA OUTPUT FILES These input files are internally converted into a coordinates file format, and then the annotations are added accordingly. The only consequence of this method is that the trnascan.output file that is left after executing the program is in fact in coordinates file format, which can be a bit confusing for the user. Only tRNA records with a single asterisk ("*") before the words "start position=" will be annotated; these asterisks are a mean to let the user decide which tRNA to annotate or not. OPTIONS -B Do not make a backup of the modified masterfiles before updating them. Default is to keep a copy of the masterfile file with an extension of ".bak". This option also applies to the inputfiles supplied as last argument. -P masterfile This option informs mfmain that inputfile is assumed to be in prot.lst format (see flip(1)). masterfile is the ONLY masterfile that is going to be annotated, and therefore all contigs specified in inputfile must reference a contig in masterfile. See the section "PROT.LST FORMATED FILES". -g Used with the -P option, this tells mfmain that the prot.lst file has been extended by the user to supply gene names for each reading frame. Immediately after the greater-than sign, the program expects either 1) an equal sign and a gene name, as in "=nad9" or 2) a single hash sign, "#". In the second case, a name of the form "orfNNN" will be generated by the program, where NNN is the number of amino acid in the prot.lst record. If neither of the two specification are found, the prot.lst record will be skipped. Examples of prot.lst headers: Will generate a G-orf87 range annotation: ># 1182gsm87; orig. 3814 [3805] to 4077 Will generate a G-nad9 range annotation: >=nad9 1182gsm87; compl. 7580 to 7218 Will be skipped: >1182gsm87; compl. 7580 to 7218 -C masterfile This option informs mfmain that inputfile is assumed to be in coordinates format (see gb2mfc(1) and mf2stad(1)). masterfile is the ONLY masterfile that is going to be annotated, and therefore all contigs specified in inputfile must reference a contig in masterfile. -T trnascan.output This option informs mfmain that inputfile is assumed to be the output of the OGMP version of trnascan(1). For each tRNA identified in the trnascan.output file, a pair of annotations like these will be inserted in masterfile: ; G-trnX(xxx) ==> start ; G-trnX(xxx) ==> end The gene name is obtained from the trnascan.output file, and of course the gene can be annotated on both strands. masterfile is the ONLY masterfile that is going to be annotated, and therefore all contigs specified in inputfile must reference a contig in masterfile. Only tRNA records with a single asterisk ("*") before the words "start position=" will be annotated; these asterisks are a mean to let the user decide which tRNA to annotate or not. inputfile Use this file rather than "tracks.res" or "prot.lst". FILES /share/supported/apps/ogmp/lib/vectors.lst A database of basename of cloning vector files. /tmp/mfmain.$USER.$$ Temporary directory where masterfiles are edited. . (the current directory) Where masterfiles are expected to be found. tracks.res Output of tracks(1). SEE ALSO tracks(1), coordinates(5), flip(1), gb2mfc(1), mf2stad(1) AUTHORS Pierre Rioux, Tim Littlejohn (Project Management), Organelle Genome Megasequencing Project, Jun. 1994.