Overview of Feature keys and their syntax: GENE & ORF (G-"gene_name" & G-ORF#) EXON (G-"gene_name"-E#) INTRON (G-"gene_name"-I#) RNA (G-RNA-"rna_name") FRAGMENT (G-"gene_name"-F#; G-"gene_name"-P#) SIGNAL (G-Sig-"signal_name") MOTIF (G-Mot-"motif_name") MOBILE_ELEMENT (G-Mob-"element_name") VARIATION (G-Var-"variation_name") CITATION (G-Cit-"citation_name") SOURCE (currently not supported) ------------------------------------------------------------------------------- List of all qualifiers : Qualifiers ------------------------------------------------------------------------------- Feature Key GENE & ORF Syntax G-"gene_name"(Corresponding GenBank feature) Definition coding regions of protein genes (CDS), rRNAs or tRNAs Optional qualifiers /citation = [number] /codon = seq:"codon-seq",aa: (codon reassignment in a particular gene) /copy_number = [number] (Note: _1, _2, etc. has been abandoned, and replaced by ... /copy_number=1 etc) /elongator (for tRNA-Met genes) /endo (endonuclease, other than GIY.,LAGLI.,OMEGA) /inframe = (intron orf, in frame with upstream exon) /initiator (for tRNA-Met genes) /intronic (for intronic ORFs) /first_aa = (only if not Met) /fragment_number = [number] (Note: _a, _b, etc. has been abandoned, and replaced by ... /fragment_number=1 etc) /function = "text" /GIY-YIG /LAGLIDADG /note = "text" /OMEGA /partial /product = "text" /pseudo /rev_trans /standard_name = "text" /synonym = "gene_name" /ymf# (for orfs) /tRNA-editing (for tRNA genes) Comment ------------------------------------------------------------------------------- Feature Key EXON Syntax G-"gene_name"-E#(Corresponding GenBank feature) Definition region of genome that codes for a portion of spliced mRNA; does not contain 5'UTR and 3' UTR Optional qualifiers /citation = [number] /codon_start = # /label = feature_label /map = "text" /note = "text" /partial /pseudo Comment ------------------------------------------------------------------------------- Feature Key INTRON Syntax G-"gene_name"-I#(Corresponding GenBank feature) Definition a segment of DNA that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it. A special case is trans- splicing, where the fragmented intron is coded at distant genomic locations, and complementation of the intron structure occurs on the RNA level. Optional qualifiers /altern_splice = (5'site:,3'site:) /citation = [number] /cons_splice = (5'site:,3'site:) /group (#=1,1a,1b,..,2,3) /twintron /hompos = "X,Y" (conserved insertion site, (X=aa.pos. or nt.pos; Y=species_abbreviation) /label = feature_label /map = "text"" /note = "text" /partial Comment cons_splice is used only when one of the intron's splice sites does not match the GT...AG consensus. The inner intron in a twintron is designated -II ------------------------------------------------------------------------------- Feature Key RNA Syntax G-RNA(Corresponding GenBank feature)not worked Definition describes RNAs other than tRNAs, rRNAs, and those of unknown function Optional qualifiers /D-loop /partial /gene /mRNA* /join (for fragments) /polyA_site /polyA_signal* /precursor_RNA* /primary_transcript /trans_splice* /mod_base* /shape* /attenuator* /promoter* /RBS* Comment Those marked by a "*" are not yet supported. -------------------------------------------------------------------------------- Feature Key FRAGMENT and PART Syntax G-"gene_name-F#"(Corresponding GenBank feature) G-"gene_name-P# Definition fragments of genetic elements that are interrupted by insertion elements Optional qualifiers /citation = [number] /join /label = feature_label /map = "text" /note = "text" /partial Comment There is fine conceptual difference between the key FRAGMENT and the qualifier "/fragment_number". When a gene that is broken up in pieces that are rearranged in the genome, we would use the feature key "gene" with qualifier "/fragment_number", Alternatively, gene suffices _a, _b ("gene_name"_a etc.) can be used to distinguish fragment 1, 2 etc. When a gene (or its reading frame) is interrupted, we would use the feature key "fragment". Here, the entire coding region must be enbraced by a gene annotation (gene start gene-F1 start gene-F1 end gene-F2 start gene-F2 end gene end) The key PART can be used as an interim solution, as long as not all pieces of a gene are known, and therefore fragments _a, _b, etc cannot yet be assigned. For example, if the gene pieces correponding to amino acid 1-30 and 50-100 have been sequenced, but not yet those including aa.31-49, these gene fragments would annotated gene-P10 and gene-P50. ------------------------------------------------------------------------------- Feature Key MOTIF Syntax G-Mot-"motif_name"(Corresponding GenBank feature) Definition sequence motifs of unknown function Optional qualifiers /A+T_rich /dispersed /G+T_rich /inversion /LTR /note = "text" /organization = "text" /repeat_element /repetitive /stem_loop /tandem Comment Conceptual overlap with Signal & Site. Motif is used particularly for annotation of repeat elements, whereas Signal is useful for the description of arrays. Note that promoter is a qualifier of Signal, while stem_loop is a qualifier of Motif. These two qualifier cannot be assigned to a single MF-feature key, because this would result in a conflict when converting to the GenBank data model (in which promoter and stem_loop are both feature keys). -------------------------------------------------------------------------------- Feature Key SIGNAL Syntax G-Sig-"signal_name"(Corresponding GenBank feature) Definition signals, sites and conserved regions Optional qualifiers /D-loop /LTR /note = "text" /ORI (repl.origin,motif defining the signal) /organization = "text" /promoter* /putative /RBS* /recomb (recombination site or region) /repeat_unit = "text" /stem_loop /telomere (telomere region) Positions and sites (to be used with the "==> point" operator) /RNAinit (transcription start) /RNAterm (transcription termination) /REPinit (replication start) /RNA-3' (3' end of an RNA) /RNA-5' (5' end of an RNA) /RNAproc (qualifier used in conjunction with RNA-5', to indicate that the 5' end is the result of RNA processing, no transcription start) Comment -------------------------------------------------------------------------------- Feature Key MOBILE_ELEMENT Syntax G-Mob-"element_name"(Corresponding GenBank feature) Definition mobile sequence elements Optional qualifiers /copia # (drosophila element) /endo # (endonuclease, other than the 3 above) /GIY-YIG /LAGLIDADG /note = "text" /OMEGA /partial /putative (equivalent to /evidence=not_experimental) /repetitive /retro # (retrotransposon) /transposon /TY # (yeast element) -------------------------------------------------------------------------------- Feature Key VARIATION Syntax G-Var-"var_name"(Corresponding GenBank feature) Definition sequence variation of any nature, at the RNA or DNA level Mandatory qualifiers are one of the following: /deletion /inversion /substitution G==>X,Y,Z (single base substitution) Optional qualifiers /mutation /polymorph /codon_altern [codon, including IUB code] /editing (RNA editing) /RNAmodif /DNAmodif /missence (for substitution in translated seq.) /note = "text" /nonsence (base substitution in translated seq.) /silent (base substitution in translated seq.) /transl_altern = "aa,aa" (base substitution in translated seq.) (aa=amino acid in IUPAC 1-Letter code) Positions and sites (to be used with the "==> point" operator) /insertion = "sequence" Comment further qualifiers that describe the strain and other details of variation will be added -------------------------------------------------------------------------------- Feature Key CITATION Syntax G-citation-"citation_name"(Corresponding GenBank feature)not worked Definition relates a sequence segment to a journal citation Optional qualifiers /citation = [number] = "text" Comments used to specify sequence segments that have been published earlier. Note that there exists also a qualifier /citation for the feature key Gene. -------------------------------------------------------------------------------- Feature Key SOURCE - currently not supported Syntax see example (in masterfile header) Definition identifies the biological source of the specified span of the sequence. This key is mandatory. identifies the biological source of the specified span of the sequence. This key is mandatory. Every entry will have, as a minimum, a single source key spanning the entire sequence. More than one source key per sequence is permissable Mandatory qualifiers /organism="text" Optional qualifiers /cell_line = "text" /cell_type = "text" /chromosome = "text" /citation = [number] /clone = "text" /clone_lib="text" /cultivar = "text" /cyanelle = "text" /dev_stage = "text" /frequency = "text" /germline /halotype = "text" /lab_host = "text" /isolate = "text" /kinetoplast /label = feature_label /macronuclear /map = "text" /mitochodrion /note = "text" /plasmid = "text" /pop_variant = "text" /proviral /rearranged /sex = "text" /sequenced_mol = "text" /specific_host = "text" /strain = "text" /sub_clone = "text" /sub_species = "text" /sub_strain = "text" /tissue_lib = "text" /tissue_type = "text" /variety = "text" /plastid /nuclear /nucleomorph /shape* /complete /translate_table# Molecule scope any Comment multiple qualifiers (e.g., /clone=)