FLIP(1)                      OGMP SEQUENCE UTILITIES                    FLIP(1)

NAME
  flip - translates and reformats DNA sequences

SYNOPSIS
  flip [-S] [-c] [-f] [-l length] [-L length] [-m] [-d deviation] 
       [-s start_codons] [-g gen_code] [-n] [-D] [-F] file

DESCRIPTION
  Flip translates and reformats DNA sequences. The produced output files can
  be used by other programs such as BBLAST (Pierre Rioux, 1994, a batch version
  of BLAST (Altschul et al, J. Mol.Biol. 215, 403-410), FASTA (W. Pearson, 
  Methods enzymol. 183, 63-68), NIP (Queen & Korn, Nucl. Acids Res., 12, 
  581-599) for further analysis.
  
  Input file: 
  The input file ("file") contains either one or several nucleotide 
  sequences. Each sequence has a header line that starts with a ">" in the 
  first column, optionally followed by a header text that describes the
  particular sequence. If absent, a default sequence name is created by using
  the file name. Flip supports the fasta sequence file format and its 
  extensions, such as the masterfile format (see APPENDIX II) and staden 
  format. The only sequence characters allowed are "acgtnACGTNG" and, in 
  addition, the special characters "!", "@", "#", "+" and "-", that may be used
  for labelling particular nucleotides or sequence regions (see masterfile 
  format in APPENDIX II). Whitespaces are ignored by flip. Note that flip will
  not allow file to have the name 'prot.lst', 'prot.src', 'nocompl', 'compl', 
  'prot.6rf' or 'prot.lst.dna'.

  Output files:
  Flip produces four or five files, depending on the parameters used:

    - nocompl: this file contains the same sequence as the input file does but 
         reformatted to 60 sequence characters per line (nucleotides and 
         special characters) with the nucleotide count appearing in the left
         margin. 

    - compl: this file contains the complemented and reversed sequence (briefly
         referred to as complementary sequence), displayed in the same way as 
         in the nocompl file.

    - prot.lst: the listing of all proteins potentially encoded by the DNA
         sequence in all 6 reading frames. By default, flip reports the amino
         acid sequence of every protein whose length is at least 48 residues 
         long, independent of a particular start codon (this behaviour can be 
         changed, see section "Parameters"). Also, the DNA sequence is assumed 
         to be linear, but flip can support circular sequences (see section 
         "Parameters"). 
         
         The prot.lst file contains:

         - A header text which consists of:

             i) The name of the DNA sequence from which the protein was 
                deduced.
            ii) An indicator of the strand on which the protein is encoded
               ("compl" = complementary strand and "orig" = original strand)
           iii) Coordinates of the protein-coding region of the DNA sequence:
                the first number is the position of the first nucleotide 
                associated to the protein, and the second one is the position 
                of the last nucleotide associated to the protein (stop codon
                included, if any).
            iv) The nucleotide sequence context of the start and end of the
                protein-coding region. 12 upstream of the 5' end and 12 
                downstream of the 3' end, to facilitate locating the protein in
                the sequence file. The first and second dodecamer correspond to
                the first and second coordinate, respectively. Note that the 
                coordinates and the displayed dodecamer always refer to the 
                *forward* strand, no matter the strand on which the protein 
                lies. Also, flip might display fewer than 12 characters for
                circular molecules.

         - If the -L switch was used (see Parameters), a line with a semi-colon
           in the first column (a comment) containing all the amino acids 
           upstream of the start codon will be displayed.

         - The protein sequence itself (including the stop sign "*" if a stop
           codon is present in the DNA sequence), formatted to 60 nucleotides 
           per line. Each line will be preceded by the amino acid count.

         - The length of the protein by itself on a single, last line.

         The order in which the proteins are reported in prot.lst is as
         follows:

           - All proteins on the original strand are reported first (sorted
	     in increasing order of start positions), followed by those on the
             complementary strand, sorted in decreasing order of start
             positions.

    - prot.src: this file is basically the same as prot.lst, but in a format
         that is interpreted by fasta as a concatenation of all proteins
         present in prot.lst by adding a semi-colon in the first column of the
         header lines. No numbering is done and stop codons are not displayed.
         Also, a file header is added at the beginning of the file. 
        
    - prot.6rf: this file is only produced when the "-f" option was used. It
         contains the 6 reading frames translations displayed together with the
         DNA sequence.

    - prot.lst.dna: this file is only produced when the "-D" option is used. It
	 contains the DNA sequence of all the orfs listed in prot.lst. It 
         resembles prot.lst a lot. The header of each orf indicates the start 
	 and end positions, along with the strand on which the orf lies, 
	 followed by the DNA sequence itself, formatted to 60 nucleotides
	 (ACGTN) per line, with the nucleotide count appearing in front of each
	 line


  The genetic code passed on the command line (-g switch, see options) is used 
  by default. If there is no -g switch on the command line, flip will scan the
  file prot.prm (if any, see FILES) in order to find an appropriate genetic 
  code using the masterfile name. If none can be found, genome_name.lst is 
  consulted. If all these operations do not provide flip with a suitable 
  genetic code, the "Basic" code (number 20, see APPENDIX I) is used for 
  translation. It is also possible to specify deviations to the default genetic
  code used or to apply other translation codes (see "Parameter" and APPENDIX
  I).

  Parameters:
  In the absence of command line arguments other than the input file name, -g
  or -S, flip attempts to find a file named "prot.prm" in the current 
  directory. This file contains values of parameters related to the 
  application. If flip cannot find prot.prm, it uses the parameter's default 
  values. If command line switches other than -g or -S are used, prot.prm is 
  not read.

  A typical prot.prm file follows:

  # All lines whose first non-blank character is "#" are comments
  # Blank lines ignored.
  # Everything is case-insensitive, except the "Deviation" line (see below)
  
  MinOrfLength = 20
  Circular = 1
  6RF = 1
  Starts = att
  Deviation = att=l,agg=*,tga=n

  Each line has the form id = value, where id can be: MinProtLength, 
  MinOrfLength, Circular, 6RF, Deviation, Starts, GenCode (all case-
  insensitive). The effect of each line on flip's behaviour is explained below:

  MinOrfLength = length
     The minimum length (in amino acids) a protein must have in order to be 
     reported by flip. Defaults to 48. Note that MinOrfLength and MinProtLength
     cannot be set at the same time. The corresponding command line switch is 
     "-l" (see OPTIONS). Note that the ending stop codon (if any), is never 
     taken into account during the length computation.

  MinProtLength = length
     The minimum length (in amino acids), counted from the first start codon 
     occurring in a reading frame, a protein must have in order to be reported 
     by flip. If the protein doesn't have a start codon, it is not reported. 
     Note that MinOrfLength and MinProtLength cannot be set at the same time. 
     By default, MinProtLength is 0, and so flip will not even check whether or
     not an orf has a start codon (in other words, as long as the minimum 
     length imposed by MinOrfLength is respected, the protein is reported). The
     corresponding command line switch is "-L" (see OPTIONS). Note that the 
     ending stop codon (if any), is never taken into account during the length 
     computation.
     
  Circular = 0|1
     Assume that the sequence is circular if set to 1. Flip will verify that 
     the input file contains only one sequence header. By default, the sequence
     is taken to be linear. The corresponding command line switch is "-c" (see
    OPTIONS).

  6RF = 0|1
     Produce the file prot.6rf if set to 1. By default, this file is not
     produced. The corresponding command line switch is "-f" (see OPTIONS).

  Starts = s_codons
     Use s_codons as the start codons, instead of the ones defined in the
     genetic code. s_codons is a list of codons separated by commas (case-
     insensitive). Note that specifying new start codons doesn't alter the
     genetic code. It rather states that the given codon(s) are legal start 
     codons and thus considered if MinProtLenght is specified (see above). 
     s_codon must be a string of the form "ACT,ATT" (codons separated by 
     commas). s_codon is case-insensitive. Note that if the start codon should 
     be translated as a Met the "-m" command line switch should be used. The
     command line switch that is used to redefine the start codons is "-s" 
     (see OPTIONS).

  Deviation =  dev_string
     Apply the specified deviations from the universal code. The value of
     dev_string must be a string of the form "atc=r,ttt=p,agt=\*" (note that
     the star is preceded by a backslash in order to avoid shell 
     interpretation). All the codon ids are case-insensitive, but the aa ids 
     are not (that is, codon aTT and AtT refer to the same codon but when
     defining ATA=m,ATT=M, the protein sequence will contain upper and 
     lowercase M's, according to the codons present in the DNA sequence. "-d"
     (see OPTIONS) can also be used to specify deviations to the current 
     genetic code.

  Gencode = digit 
     Use NCBI's code number digit instead of the default basic code to
     perform translation. Digit can be any number from 1 to 21 except 7 and 8.
     Refer to APPENDIX I to see all the genetic codes supported by flip. By
     default, the "Basic" genetic code is used for translation (code number 20,
     see APPENDIX I). "-n" can also be used to specify a genetic code to use
     during translation (see OPTIONS). If this line does not appear in prot.prm
     the genetic code found in genome_name.lst for the masterfile is used. If
     the masterfile does not have a designated genetic code, the default 
     ("Basic", code 20) is used.


OPTIONS

  -d dev_string
     Apply the specified deviations to the current code. The parameter of 
     the "-d" switch must be a string of the form "atc=r,ttt=p,agt=\*" (note
     that the star is preceded by a backslash in order to avoid shell 
     interpretation). All the codon ids are case-insensitive, but the aa ids 
     are not (that is, codon aTT and AtT refer to the same codon but when
     defining ATA=m,ATT=M, the protein sequence will contain upper and 
     lowercase M's, according to the codons present in the DNA sequence. 

  -g digit 
     Use NCBI's code digit instead of the default basic code (code #20) to
     perform translation. Digit can be any number from 1 to 21 except 7 and 8.
     To see all the genetic codes supported by flip, see APPENDIX I. If this 
     switch is not used, prot.prm is consulted to see if the current
     masterfile has an associated genetic code. If so, flip will use it. 
     Otherwise, genome_name.lst (see FILES) is consulted. If flip does not find
     a genetic code in this file, it uses the default (code #20).

  -f Produce the file prot.6rf. By default, this file is not produced.

  -c Assume the sequence is circular. Flip will verify that the input file
     contains only one sequence header. By default, the sequence is taken to 
     be linear.

  -s s_codons
     Use s_codons as the start codons, instead of the ones defined in the
     genetic code. s_codons is a list of codons separated by commas (case-
     insensitive). Note that specifying new start codons doesn't alter the
     genetic code. It rather states that the given codon(s) are legal start 
     codons and thus considered if the -L options (see below) is used. s_codon
     must be a string of the form "ACT,ATT" (codons separated by commas). 
     s_codon is case-insensitive. Note that if the start codon should be 
     translated as a Met the "-m" command line switch should be used. 

  -l length
     The minimum length (in aa) a protein must have in order to be reported by
     flip. Defaults to 48. Note that -l and -L (see below) cannot be set at the
     same time. Note that the ending stop codon (if any), is never taken into
     account during the length computation.

  -L length
     The minimum length (in aa), counted from the first start codon occurring 
     in a reading frame, a protein must have in order to be reported by flip. 
     If the protein doesn't have a start codon, it is not reported. Note that
     -L and -l (see above) cannot be set at the same time. By default, flip
     will not even check whether an orf has a start codon (in other words, as
     long as the minimum length imposed by -l is respected, the orf is 
     reported). Note that the ending stop codon (if any), is never taken into
     account during the length computation.

  -m
     With this switch, flip will translate the first codon of a protein by 'M'
     if the codon is a start codon. The default start codons for each genetic 
     code can be seen in APPENDIX I. Default is to translate all start codons
     with the amino acid they would produce if they would appear in a position
     other than the first. Note that there are no corresponding identifiers in
     prot.prm.

  -S
     Silent mode. In this mode, flip will not display anything on stdout, 
     except its version number. Note that there are no corresponding 
     identifiers in prot.prm.

  -n
     Append a "_<digit>" (where digit is the orf's rank) in the header of each
     orf listed in prot.lst, prot.src and prot.lst.dna (see above for a
     description of prot.lst's header).

  -D
    Produce the file 'prot.lst.dna' which contains the DNA sequence of each orf
    reported in prot.lst. By default this file is not produced.

  -F
    Separate the contigs of nocompl and compl by a line containing only a "*".
    This will be useful when these files are used as libraries by fasta(1).
    It will ensure that fasta considers the sequences in these files as 
    separate sequences.

     
FILES
  prot.lst 
     List of all the orfs found
  prot.lst.dna
     DNA sequence of all the orfs reported in prot.lst
  prot.src
     List of all the proteins found (fasta format)
  nocompl
     Original sequence (formatted).
  compl
     Complemented reversed sequence (formatted).
  prot.6rf
     All six reading frames
  prot.prm
     File containing some parameter values for the application
  /share/supported/apps/ogmp/lib/genome_name.lst
     Is consulted by flip when the -g switch is not used and when there is no 
     genetic code specification in prot.prm (or no file prot.prm at all).


APPENDIX I
  In this section, all the genetic codes supported by flip are described. These
  codes are identical to the ones used at NCBI except for codes 16 to 21, which
  have been added by the OGMP. A given genetic code has four fields:

   i) "Name" states the name of the genetic code
  ii) "Id" is the number used by flip to refer to this code. This number can be
      used as a parameter to the "-n" switch.
 iii) "Code". For each codon, the amino acid it codes for. 
  iv) "Starts". If an "M" appears in the column associated with a given codon, 
      then this codon is a start codon according to the current genetic code.
  iv) "Codons". All the codons, in column representation.

  There are 19 genetic codes, having ids from 1 to 6 and 9 to 21. 


  Name  : Standard
  Id    : 1
  Code  : FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
  Starts: ---M---------------M---------------M----------------------------
          TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
          TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG


  Name  : Vertebrate Mitochondrial
  Id    : 2
  Code  : FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG
  Starts: --------------------------------MMMM---------------M------------
          TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
          TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG


  Name  : Yeast Mitochondrial
  Id    : 3
  Code  : FFLLSSSSYY**CCWWTTTTPPPPHHQQRRRRIIMMTTTTNNKKSSRRVVVVAAAADDEEGGGG
  Starts: -----------------------------------M----------------------------
          TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
          TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG


  Name  : Mold Mitochondrial; Protozoan Mitochondrial; Coelenterate
          Mitochondrial; Mycoplasma; Spiroplasma
  Id    : 4 
  Code  : FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
  Starts: --MM---------------M------------MMMM---------------M------------
          TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
          TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG


  Name  : Invertebrate Mitochondrial
  Id    : 5
  Code  : FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG
  Starts: ---M----------------------------MMMM---------------M------------
          TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
          TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG


  Name  : Ciliate Nuclear; Dasycladacean Nuclear; Hexamita Nuclear
  Id    : 6
  Code  : FFLLSSSSYYQQCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
  Starts: -----------------------------------M----------------------------
          TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
          TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG


  Name  : Echinoderm Mitochondrial
  Id    : 9
  Code  : FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG
  Starts: -----------------------------------M----------------------------
          TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
          TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG


  Name  : Euplotid Nuclear
  Id    : 10
  Code  : FFLLSSSSYY**CCCWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
  Starts: -----------------------------------M----------------------------
          TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
          TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG


  Name  : Bacterial
  Id    : 11
  Code  : FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
  Starts: ---M---------------M------------MMMM---------------M------------
          TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
          TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG


  Name  : Alternative Yeast Nuclear
  Id    : 12
  Code  : FFLLSSSSYY**CC*WLLLSPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
  Starts: -------------------M---------------M----------------------------
          TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
          TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG


  Name  : Ascidian Mitochondrial
  Id    : 13
  Code  : FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSGGVVVVAAAADDEEGGGG
  Starts: -----------------------------------M----------------------------
          TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
          TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG


  Name  : Flatworm Mitochondrial
  Id    : 14
  Code  : FFLLSSSSYYY*CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG
  Starts: -----------------------------------M----------------------------
          TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
          TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG


  Name  : Blepharisma Macronuclear
  Id    : 15
  Code  : FFLLSSSSYY*QCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
  Starts: -----------------------------------M----------------------------
          TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
          TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG


  Name  : TAG-Leu
  Id    : 16
  Code  : FFLLSSSSYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
  Starts: -----------------------------------M----------------------------
          TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
          TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG


  Name  : TGA-Trp with GTG-initiation
  Id    : 17
  Code  : FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
  Starts: -----------------------------------M---------------M------------
          TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
          TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG


  Name  : TGA-Trp
  Id    : 18
  Code  : FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
  Starts: -----------------------------------M----------------------------
          TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
          TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG


  Name  : Tetrahymena Mitochondrial
  Id    : 19
  Code  : FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
  Starts: ---M----------------------------M-MM---------------M------------
          TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
          TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG


  Name  : Basic 
  Id    : 20
  Code  : FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
  Starts: -----------------------------------M----------------------------
          TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
          TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG


  Name  : TAG-Leu,TCA-stop
  Id    : 21
  Code  : FFLLSS*SYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
  Starts: -----------------------------------M----------------------------
          TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG
  Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG
          TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG


APPENDIX II

  This appendix describes briefly the masterfile format used by the OGMP. The
  masterfile (MF) provides an integration of DNA sequence and its feature 
  annotations, with the feature annotations embedded at the corresponding 
  coordinates in the sequence. The naming rules for genetic elements have been
  designed such that they are easily interpretable by a human reader, but are
  at the same time computer-readable. The OGMP MF format is a superset of the 
  FASTA format. In collaboration with NCBI, we have developed tools that 
  permits translation of an MF directly into a GenBank record (in asn.1 
  format), and vice versa, facilitating sequences submission to NCBI. For more 
  information on the MF format, please consult:

        http://megasun.bch.umontreal.ca/People/lang/ogmp-mf/intro.html


  Flip was first published in bionet.software. The reference is:

      Subject:      FLIP: a Unix C program used to find/translate orfs
      From:         Nicolas Brossard <brossard@bch.umontreal.ca>
      Date:         1997/11/25
      Message-ID:   <347B3A1B.794BDF32@bch.umontreal.ca>
      Newsgroups:   bionet.software


AUTHOR
   Nicolas Brossard (program design and coding of current version).
   Gertraud Burger (project management).
   B.Franz Lang and Gertraud Burger (Fortran version of flip, Nucl. Acids Res.
   14, 455-465).
   Pierre Rioux (support in program design).

Organelle Genome Megasequencing Program, NOV97.