MF2MAP(1) OGMP SEQUENCE UTILITIES MF2MAP(1) NAME mf2map - produce gene map coordinates of genes, introns and orfs of a given masterfile SYNOPSIS mf2map [-h] [-i size] [-g size] [-t size] [-o ocode] -m masterfile DESCRIPTION The graphical representation of a circular genome map necessitates a file that describes the size of each element on the map: - genes and exons (clockwise and counter-clockwise) - introns (clockwise and counter-clockwise) - orfs (clockwise and counter-clockwise) Mf2map produces this file from the masterfile given as input. The masterfile is expected to be in a valid format (mf2map runs chkogmp(1) on it to verify this). Mf2map then runs mf2stad(1) to obtain the coordinates from which it will compute the size of every item to be displayed on the map. Then this coordinate file is checked and invalid annotations are removed. Those include: - Annotations in incorrect format - Annotations of unknown genes (ie those that don't exist in the file /share/supported/apps/ogmp/lib/gene_names.lst (1) ) - Annotations of mobile elements - Annotations of orf's subparts (ex: G-cob-I2-orf246-F2 ==> start ) If mf2map finds two genes that overlap each other (say gene1 and gene2), it will remove these two gene annotations and replace them with: - An annotation for the part of gene1 not covered by gene2 - An annotation for the part of gene2 not covered by gene1 - A special "overlap" annnotation (part covered by both gene1 and gene2) Once the cleaning phase is done, mf2map computes the size of every item (including the size of the gaps seperating all consecutive items). There are some restrictions on the size that every item must have: - Trnas have to be exactly 70bp long - Intergenic gaps and the "overlap" slices must be 60 bp long - Every other item must be at least 50 bp long Note that these default minimums can all be changed with command line arguments (see OPTIONS). When a particular item doesn't meet the size requirements, mf2map tries to adjust it. It tries to resize the immediate neighbors of the "faulty" item so that it's size can now be increased or decreased in order to have a valid size. Mf2map can also decide to resize only one of the neighbors of the "faulty" item, but in this case, it must have a size of at least 100 nucleotides, otherwise mf2map will not try to resize it (this minimum value can be changed via a command line switch). If mf2map cannot resize a given slice (generally because the neighboring slices are too small), it just leaves it as it is. Once the adjustments are done, mf2map produces the file masterfile.prn which contains all the information necessary to produce a genome map with charisma. Every line in the output file is a comma-separated list of items. The first field is always empty (set by user in charisma). The second field is for the markers. The third and fourth are for the sizes of the strands clockwise and counterclockwise (if any). The last field is the annotation associated with the strand. It will be preceeded by a "+" if the strand is clockwise and by a "-" otherwise (gaps are unsigned). All lines end with a "^M" (control-M) in order to respect DOS's format restrictions. The first line of the output file is the header, which consists of the species associated with the masterfile name and the DNA type (mt, pt, etc...). This header must appear in the ouput file if the user plans to run mf2ps on it. It can be removed (see OPTIONS) if charisma needs to be run on the output file instead. OPTIONS -i size Use size as the minimum size of the gap separating two genes instead of the default 70. Note that if two genes overlap, the minimum size of the overlaping part is going to be size. -g size Use size as the minimum size of all genes, exons and introns (default 50) -t size Use size as the exact size that all trnas must have (default 70). -o ocode Use ocode as the organism code for the masterfile passed as argument. This switch does not have to be used for projects listed in genome_name.lst but is mandatory for those that are not listed in this file. -s size Use size as the minimum size that an item must have if mf2map adjusts the size of this item (and only this item) in order to allow another one to have a valid size (default is 100). -h Do not put a header at the top of the output file indicating the species and the DNA type (mt, pt, etc...). The header is computed using the ocode and appears by default. FILES /share/supported/apps/ogmp/lib/gene_name.lst A description of known mitochondrial genes. /share/supported/apps/ogmp/lib/perl/ogmp/gene_names.pl A perl library to read and parse the gene_name.lst file. /share/supported/apps/ogmp/bin/chkogmp A perl program to verify a masterfile's format /share/supported/apps/ogmp/bin/mf2stad A perl program used to compute the coordinate file /share/supported/apps/ogmp/lib/genome_name.lst List of all the known projects SEE ALSO ogmp(1), chkogmp(1), mf2stad(1) AUTHOR BF Lang (conception) P Rioux (higher-level design) N Brossard (detailed design and programming) Organelle Genome Megasequencing Project.