FLIP(1) OGMP SEQUENCE UTILITIES FLIP(1) NAME flip - translates and reformats DNA sequences SYNOPSIS flip [-S] [-c] [-f] [-l length] [-L length] [-m] [-d deviation] [-s start_codons] [-g gen_code] [-n] [-D] [-F] file DESCRIPTION Flip translates and reformats DNA sequences. The produced output files can be used by other programs such as BBLAST (Pierre Rioux, 1994, a batch version of BLAST (Altschul et al, J. Mol.Biol. 215, 403-410), FASTA (W. Pearson, Methods enzymol. 183, 63-68), NIP (Queen & Korn, Nucl. Acids Res., 12, 581-599) for further analysis. Input file: The input file ("file") contains either one or several nucleotide sequences. Each sequence has a header line that starts with a ">" in the first column, optionally followed by a header text that describes the particular sequence. If absent, a default sequence name is created by using the file name. Flip supports the fasta sequence file format and its extensions, such as the masterfile format (see APPENDIX II) and staden format. The only sequence characters allowed are "acgtnACGTNG" and, in addition, the special characters "!", "@", "#", "+" and "-", that may be used for labelling particular nucleotides or sequence regions (see masterfile format in APPENDIX II). Whitespaces are ignored by flip. Note that flip will not allow file to have the name 'prot.lst', 'prot.src', 'nocompl', 'compl', 'prot.6rf' or 'prot.lst.dna'. Output files: Flip produces four or five files, depending on the parameters used: - nocompl: this file contains the same sequence as the input file does but reformatted to 60 sequence characters per line (nucleotides and special characters) with the nucleotide count appearing in the left margin. - compl: this file contains the complemented and reversed sequence (briefly referred to as complementary sequence), displayed in the same way as in the nocompl file. - prot.lst: the listing of all proteins potentially encoded by the DNA sequence in all 6 reading frames. By default, flip reports the amino acid sequence of every protein whose length is at least 48 residues long, independent of a particular start codon (this behaviour can be changed, see section "Parameters"). Also, the DNA sequence is assumed to be linear, but flip can support circular sequences (see section "Parameters"). The prot.lst file contains: - A header text which consists of: i) The name of the DNA sequence from which the protein was deduced. ii) An indicator of the strand on which the protein is encoded ("compl" = complementary strand and "orig" = original strand) iii) Coordinates of the protein-coding region of the DNA sequence: the first number is the position of the first nucleotide associated to the protein, and the second one is the position of the last nucleotide associated to the protein (stop codon included, if any). iv) The nucleotide sequence context of the start and end of the protein-coding region. 12 upstream of the 5' end and 12 downstream of the 3' end, to facilitate locating the protein in the sequence file. The first and second dodecamer correspond to the first and second coordinate, respectively. Note that the coordinates and the displayed dodecamer always refer to the *forward* strand, no matter the strand on which the protein lies. Also, flip might display fewer than 12 characters for circular molecules. - If the -L switch was used (see Parameters), a line with a semi-colon in the first column (a comment) containing all the amino acids upstream of the start codon will be displayed. - The protein sequence itself (including the stop sign "*" if a stop codon is present in the DNA sequence), formatted to 60 nucleotides per line. Each line will be preceded by the amino acid count. - The length of the protein by itself on a single, last line. The order in which the proteins are reported in prot.lst is as follows: - All proteins on the original strand are reported first (sorted in increasing order of start positions), followed by those on the complementary strand, sorted in decreasing order of start positions. - prot.src: this file is basically the same as prot.lst, but in a format that is interpreted by fasta as a concatenation of all proteins present in prot.lst by adding a semi-colon in the first column of the header lines. No numbering is done and stop codons are not displayed. Also, a file header is added at the beginning of the file. - prot.6rf: this file is only produced when the "-f" option was used. It contains the 6 reading frames translations displayed together with the DNA sequence. - prot.lst.dna: this file is only produced when the "-D" option is used. It contains the DNA sequence of all the orfs listed in prot.lst. It resembles prot.lst a lot. The header of each orf indicates the start and end positions, along with the strand on which the orf lies, followed by the DNA sequence itself, formatted to 60 nucleotides (ACGTN) per line, with the nucleotide count appearing in front of each line The genetic code passed on the command line (-g switch, see options) is used by default. If there is no -g switch on the command line, flip will scan the file prot.prm (if any, see FILES) in order to find an appropriate genetic code using the masterfile name. If none can be found, genome_name.lst is consulted. If all these operations do not provide flip with a suitable genetic code, the "Basic" code (number 20, see APPENDIX I) is used for translation. It is also possible to specify deviations to the default genetic code used or to apply other translation codes (see "Parameter" and APPENDIX I). Parameters: In the absence of command line arguments other than the input file name, -g or -S, flip attempts to find a file named "prot.prm" in the current directory. This file contains values of parameters related to the application. If flip cannot find prot.prm, it uses the parameter's default values. If command line switches other than -g or -S are used, prot.prm is not read. A typical prot.prm file follows: # All lines whose first non-blank character is "#" are comments # Blank lines ignored. # Everything is case-insensitive, except the "Deviation" line (see below) MinOrfLength = 20 Circular = 1 6RF = 1 Starts = att Deviation = att=l,agg=*,tga=n Each line has the form id = value, where id can be: MinProtLength, MinOrfLength, Circular, 6RF, Deviation, Starts, GenCode (all case- insensitive). The effect of each line on flip's behaviour is explained below: MinOrfLength = length The minimum length (in amino acids) a protein must have in order to be reported by flip. Defaults to 48. Note that MinOrfLength and MinProtLength cannot be set at the same time. The corresponding command line switch is "-l" (see OPTIONS). Note that the ending stop codon (if any), is never taken into account during the length computation. MinProtLength = length The minimum length (in amino acids), counted from the first start codon occurring in a reading frame, a protein must have in order to be reported by flip. If the protein doesn't have a start codon, it is not reported. Note that MinOrfLength and MinProtLength cannot be set at the same time. By default, MinProtLength is 0, and so flip will not even check whether or not an orf has a start codon (in other words, as long as the minimum length imposed by MinOrfLength is respected, the protein is reported). The corresponding command line switch is "-L" (see OPTIONS). Note that the ending stop codon (if any), is never taken into account during the length computation. Circular = 0|1 Assume that the sequence is circular if set to 1. Flip will verify that the input file contains only one sequence header. By default, the sequence is taken to be linear. The corresponding command line switch is "-c" (see OPTIONS). 6RF = 0|1 Produce the file prot.6rf if set to 1. By default, this file is not produced. The corresponding command line switch is "-f" (see OPTIONS). Starts = s_codons Use s_codons as the start codons, instead of the ones defined in the genetic code. s_codons is a list of codons separated by commas (case- insensitive). Note that specifying new start codons doesn't alter the genetic code. It rather states that the given codon(s) are legal start codons and thus considered if MinProtLenght is specified (see above). s_codon must be a string of the form "ACT,ATT" (codons separated by commas). s_codon is case-insensitive. Note that if the start codon should be translated as a Met the "-m" command line switch should be used. The command line switch that is used to redefine the start codons is "-s" (see OPTIONS). Deviation = dev_string Apply the specified deviations from the universal code. The value of dev_string must be a string of the form "atc=r,ttt=p,agt=\*" (note that the star is preceded by a backslash in order to avoid shell interpretation). All the codon ids are case-insensitive, but the aa ids are not (that is, codon aTT and AtT refer to the same codon but when defining ATA=m,ATT=M, the protein sequence will contain upper and lowercase M's, according to the codons present in the DNA sequence. "-d" (see OPTIONS) can also be used to specify deviations to the current genetic code. Gencode = digit Use NCBI's code number digit instead of the default basic code to perform translation. Digit can be any number from 1 to 21 except 7 and 8. Refer to APPENDIX I to see all the genetic codes supported by flip. By default, the "Basic" genetic code is used for translation (code number 20, see APPENDIX I). "-n" can also be used to specify a genetic code to use during translation (see OPTIONS). If this line does not appear in prot.prm the genetic code found in genome_name.lst for the masterfile is used. If the masterfile does not have a designated genetic code, the default ("Basic", code 20) is used. OPTIONS -d dev_string Apply the specified deviations to the current code. The parameter of the "-d" switch must be a string of the form "atc=r,ttt=p,agt=\*" (note that the star is preceded by a backslash in order to avoid shell interpretation). All the codon ids are case-insensitive, but the aa ids are not (that is, codon aTT and AtT refer to the same codon but when defining ATA=m,ATT=M, the protein sequence will contain upper and lowercase M's, according to the codons present in the DNA sequence. -g digit Use NCBI's code digit instead of the default basic code (code #20) to perform translation. Digit can be any number from 1 to 21 except 7 and 8. To see all the genetic codes supported by flip, see APPENDIX I. If this switch is not used, prot.prm is consulted to see if the current masterfile has an associated genetic code. If so, flip will use it. Otherwise, genome_name.lst (see FILES) is consulted. If flip does not find a genetic code in this file, it uses the default (code #20). -f Produce the file prot.6rf. By default, this file is not produced. -c Assume the sequence is circular. Flip will verify that the input file contains only one sequence header. By default, the sequence is taken to be linear. -s s_codons Use s_codons as the start codons, instead of the ones defined in the genetic code. s_codons is a list of codons separated by commas (case- insensitive). Note that specifying new start codons doesn't alter the genetic code. It rather states that the given codon(s) are legal start codons and thus considered if the -L options (see below) is used. s_codon must be a string of the form "ACT,ATT" (codons separated by commas). s_codon is case-insensitive. Note that if the start codon should be translated as a Met the "-m" command line switch should be used. -l length The minimum length (in aa) a protein must have in order to be reported by flip. Defaults to 48. Note that -l and -L (see below) cannot be set at the same time. Note that the ending stop codon (if any), is never taken into account during the length computation. -L length The minimum length (in aa), counted from the first start codon occurring in a reading frame, a protein must have in order to be reported by flip. If the protein doesn't have a start codon, it is not reported. Note that -L and -l (see above) cannot be set at the same time. By default, flip will not even check whether an orf has a start codon (in other words, as long as the minimum length imposed by -l is respected, the orf is reported). Note that the ending stop codon (if any), is never taken into account during the length computation. -m With this switch, flip will translate the first codon of a protein by 'M' if the codon is a start codon. The default start codons for each genetic code can be seen in APPENDIX I. Default is to translate all start codons with the amino acid they would produce if they would appear in a position other than the first. Note that there are no corresponding identifiers in prot.prm. -S Silent mode. In this mode, flip will not display anything on stdout, except its version number. Note that there are no corresponding identifiers in prot.prm. -n Append a "_" (where digit is the orf's rank) in the header of each orf listed in prot.lst, prot.src and prot.lst.dna (see above for a description of prot.lst's header). -D Produce the file 'prot.lst.dna' which contains the DNA sequence of each orf reported in prot.lst. By default this file is not produced. -F Separate the contigs of nocompl and compl by a line containing only a "*". This will be useful when these files are used as libraries by fasta(1). It will ensure that fasta considers the sequences in these files as separate sequences. FILES prot.lst List of all the orfs found prot.lst.dna DNA sequence of all the orfs reported in prot.lst prot.src List of all the proteins found (fasta format) nocompl Original sequence (formatted). compl Complemented reversed sequence (formatted). prot.6rf All six reading frames prot.prm File containing some parameter values for the application /share/supported/apps/ogmp/lib/genome_name.lst Is consulted by flip when the -g switch is not used and when there is no genetic code specification in prot.prm (or no file prot.prm at all). APPENDIX I In this section, all the genetic codes supported by flip are described. These codes are identical to the ones used at NCBI except for codes 16 to 21, which have been added by the OGMP. A given genetic code has four fields: i) "Name" states the name of the genetic code ii) "Id" is the number used by flip to refer to this code. This number can be used as a parameter to the "-n" switch. iii) "Code". For each codon, the amino acid it codes for. iv) "Starts". If an "M" appears in the column associated with a given codon, then this codon is a start codon according to the current genetic code. iv) "Codons". All the codons, in column representation. There are 19 genetic codes, having ids from 1 to 6 and 9 to 21. Name : Standard Id : 1 Code : FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts: ---M---------------M---------------M---------------------------- TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Name : Vertebrate Mitochondrial Id : 2 Code : FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG Starts: --------------------------------MMMM---------------M------------ TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Name : Yeast Mitochondrial Id : 3 Code : FFLLSSSSYY**CCWWTTTTPPPPHHQQRRRRIIMMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts: -----------------------------------M---------------------------- TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Name : Mold Mitochondrial; Protozoan Mitochondrial; Coelenterate Mitochondrial; Mycoplasma; Spiroplasma Id : 4 Code : FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts: --MM---------------M------------MMMM---------------M------------ TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Name : Invertebrate Mitochondrial Id : 5 Code : FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG Starts: ---M----------------------------MMMM---------------M------------ TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Name : Ciliate Nuclear; Dasycladacean Nuclear; Hexamita Nuclear Id : 6 Code : FFLLSSSSYYQQCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts: -----------------------------------M---------------------------- TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Name : Echinoderm Mitochondrial Id : 9 Code : FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG Starts: -----------------------------------M---------------------------- TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Name : Euplotid Nuclear Id : 10 Code : FFLLSSSSYY**CCCWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts: -----------------------------------M---------------------------- TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Name : Bacterial Id : 11 Code : FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts: ---M---------------M------------MMMM---------------M------------ TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Name : Alternative Yeast Nuclear Id : 12 Code : FFLLSSSSYY**CC*WLLLSPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts: -------------------M---------------M---------------------------- TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Name : Ascidian Mitochondrial Id : 13 Code : FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSGGVVVVAAAADDEEGGGG Starts: -----------------------------------M---------------------------- TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Name : Flatworm Mitochondrial Id : 14 Code : FFLLSSSSYYY*CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG Starts: -----------------------------------M---------------------------- TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Name : Blepharisma Macronuclear Id : 15 Code : FFLLSSSSYY*QCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts: -----------------------------------M---------------------------- TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Name : TAG-Leu Id : 16 Code : FFLLSSSSYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts: -----------------------------------M---------------------------- TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Name : TGA-Trp with GTG-initiation Id : 17 Code : FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts: -----------------------------------M---------------M------------ TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Name : TGA-Trp Id : 18 Code : FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts: -----------------------------------M---------------------------- TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Name : Tetrahymena Mitochondrial Id : 19 Code : FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts: ---M----------------------------M-MM---------------M------------ TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Name : Basic Id : 20 Code : FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts: -----------------------------------M---------------------------- TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Name : TAG-Leu,TCA-stop Id : 21 Code : FFLLSS*SYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts: -----------------------------------M---------------------------- TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Codons: TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG APPENDIX II This appendix describes briefly the masterfile format used by the OGMP. The masterfile (MF) provides an integration of DNA sequence and its feature annotations, with the feature annotations embedded at the corresponding coordinates in the sequence. The naming rules for genetic elements have been designed such that they are easily interpretable by a human reader, but are at the same time computer-readable. The OGMP MF format is a superset of the FASTA format. In collaboration with NCBI, we have developed tools that permits translation of an MF directly into a GenBank record (in asn.1 format), and vice versa, facilitating sequences submission to NCBI. For more information on the MF format, please consult: http://megasun.bch.umontreal.ca/People/lang/ogmp-mf/intro.html Flip was first published in bionet.software. The reference is: Subject: FLIP: a Unix C program used to find/translate orfs From: Nicolas Brossard Date: 1997/11/25 Message-ID: <347B3A1B.794BDF32@bch.umontreal.ca> Newsgroups: bionet.software AUTHOR Nicolas Brossard (program design and coding of current version). Gertraud Burger (project management). B.Franz Lang and Gertraud Burger (Fortran version of flip, Nucl. Acids Res. 14, 455-465). Pierre Rioux (support in program design). Organelle Genome Megasequencing Program, NOV97.