Overview of Feature keys and their syntax: GENE & ORF (G-"gene_name" & G-ORF#) EXON (G-"gene_name"-E#) INTRON (G-"gene_name"-I#) RNA (G-RNA-"rna_name") FRAGMENT (G-"gene_name"-F#; G-"gene_name"-P#) SIGNAL (G-Sig-"signal_name") MOTIF (G-Mot-"motif_name") MOBILE_ELEMENT (G-Mob-"element_name") VARIATION (G-Var-"variation_name") CITATION (G-Cit-"citation_name") SOURCE (currently not supported) ------------------------------------------------------------------------------- List of all qualifiers : Qualifiers ------------------------------------------------------------------------------- Feature Key GENE & ORF Syntax G-"gene_name"(Corresponding GenBank feature) Definition coding regions of protein genes (CDS), rRNAs or tRNAs Optional qualifiers /citation = [number] /codon = seq:"codon-seq",aa: (codon reassignment in a particular gene) /copy_number = [number] (Note: _1, _2, etc. has been abandoned, and replaced by ... /copy_number=1 etc) /elongator (for tRNA-Met genes) /endo (endonuclease, other than GIY.,LAGLI.,OMEGA) /inframe = (intron orf, in frame with upstream exon) /initiator (for tRNA-Met genes) /intronic (for intronic ORFs) /first_aa = (only if not Met) /fragment_number = [number] (Note: _a, _b, etc. has been abandoned, and replaced by ... /fragment_number=1 etc) /function = "text" /GIY-YIG /LAGLIDADG /note = "text" /OMEGA /partial /product = "text" /pseudo /rev_trans /standard_name = "text" /synonym = "gene_name" /ymf# (for orfs) /tRNA-editing (for tRNA genes) Comment ------------------------------------------------------------------------------- Feature Key EXON Syntax G-"gene_name"-E#(Corresponding GenBank feature) Definition region of genome that codes for a portion of spliced mRNA; does not contain 5'UTR and 3' UTR Optional qualifiers /citation = [number] /codon_start = # /label = feature_label /map = "text" /note = "text" /partial /pseudo Comment ------------------------------------------------------------------------------- Feature Key INTRON Syntax G-"gene_name"-I#(Corresponding GenBank feature) Definition a segment of DNA that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it. A special case is trans- splicing, where the fragmented intron is coded at distant genomic locations, and complementation of the intron structure occurs on the RNA level. Optional qualifiers /altern_splice = (5'site:,3'site:) /citation = [number] /cons_splice = (5'site:,3'site:) /group (#=1,1a,1b,..,2,3) /twintron /hompos = "X,Y" (conserved insertion site, (X=aa.pos. or nt.pos; Y=species_abbreviation) /label = feature_label /map = "text"" /note = "text" /partial Comment cons_splice is used only when one of the intron's splice sites does not match the GT...AG consensus. The inner intron in a twintron is designated -II ------------------------------------------------------------------------------- Feature Key RNA Syntax G-RNA(Corresponding GenBank feature)not worked Definition describes RNAs other than tRNAs, rRNAs, and those of unknown function Optional qualifiers /D-loop /partial /gene /mRNA* /join (for fragments) /polyA_site /polyA_signal* /precursor_RNA* /primary_transcript /trans_splice* /mod_base* /shape* /attenuator* /promoter* /RBS* Comment Those marked by a "*" are not yet supported. -------------------------------------------------------------------------------- Feature Key FRAGMENT and PART Syntax G-"gene_name-F#"(Corresponding GenBank feature) G-"gene_name-P# Definition fragments of genetic elements that are interrupted by insertion elements Optional qualifiers /citation = [number] /join /label = feature_label /map = "text" /note = "text" /partial Comment There is fine conceptual difference between the key FRAGMENT and the qualifier "/fragment_number". When a gene that is broken up in pieces that are rearranged in the genome, we would use the feature key "gene" with qualifier "/fragment_number", Alternatively, gene suffices _a, _b ("gene_name"_a etc.) can be used to distinguish fragment 1, 2 etc. When a gene (or its reading frame) is interrupted, we would use the feature key "fragment". Here, the entire coding region must be enbraced by a gene annotation (gene start gene-F1 start gene-F1 end gene-F2 start gene-F2 end gene end) The key PART can be used as an interim solution, as long as not all pieces of a gene are known, and therefore fragments _a, _b, etc cannot yet be assigned. For example, if the gene pieces correponding to amino acid 1-30 and 50-100 have been sequenced, but not yet those including aa.31-49, these gene fragments would annotated gene-P10 and gene-P50. ------------------------------------------------------------------------------- Feature Key MOTIF Syntax G-Mot-"motif_name"(Corresponding GenBank feature) Definition sequence motifs of unknown function Optional qualifiers /A+T_rich /dispersed /G+T_rich /inversion /LTR /note = "text" /organization = "text" /repeat_element /repetitive /stem_loop /tandem Comment Conceptual overlap with Signal & Site. Motif is used particularly for annotation of repeat elements, whereas Signal is useful for the description of arrays. Note that promoter is a qualifier of Signal, while stem_loop is a qualifier of Motif. These two qualifier cannot be assigned to a single MF-feature key, because this would result in a conflict when converting to the GenBank data model (in which promoter and stem_loop are both feature keys). -------------------------------------------------------------------------------- Feature Key SIGNAL Syntax G-Sig-"signal_name"(Corresponding GenBank feature) Definition signals, sites and conserved regions Optional qualifiers /D-loop /LTR /note = "text" /ORI (repl.origin,motif defining the signal) /organization = "text" /promoter* /putative /RBS* /recomb (recombination site or region) /repeat_unit = "text" /stem_loop /telomere (telomere region) Positions and sites (to be used with the "==> point" operator) /RNAinit (transcription start) /RNAterm (transcription termination) /REPinit (replication start) /RNA-3' (3' end of an RNA) /RNA-5' (5' end of an RNA) /RNAproc (qualifier used in conjunction with RNA-5', to indicate that the 5' end is the result of RNA processing, no transcription start) Comment -------------------------------------------------------------------------------- Feature Key MOBILE_ELEMENT Syntax G-Mob-"element_name"(Corresponding GenBank feature) Definition mobile sequence elements Optional qualifiers /copia # (drosophila element) /endo # (endonuclease, other than the 3 above) /GIY-YIG /LAGLIDADG /note = "text" /OMEGA /partial /putative (equivalent to /evidence=not_experimental) /repetitive /retro # (retrotransposon) /transposon /TY # (yeast element) -------------------------------------------------------------------------------- Feature Key VARIATION Syntax G-Var-"var_name"(Corresponding GenBank feature) Definition sequence variation of any nature, at the RNA or DNA level Mandatory qualifiers are one of the following: /deletion /inversion /substitution G==>X,Y,Z (single base substitution) Optional qualifiers /mutation /polymorph /codon_altern [codon, including IUB code] /editing (RNA editing) /RNAmodif /DNAmodif /missence (for substitution in translated seq.) /note = "text" /nonsence (base substitution in translated seq.) /silent (base substitution in translated seq.) /transl_altern = "aa,aa" (base substitution in translated seq.) (aa=amino acid in IUPAC 1-Letter code) Positions and sites (to be used with the "==> point" operator) /insertion = "sequence" Comment further qualifiers that describe the strain and other details of variation will be added -------------------------------------------------------------------------------- Feature Key CITATION Syntax G-citation-"citation_name"(Corresponding GenBank feature)not worked Definition relates a sequence segment to a journal citation Optional qualifiers /citation = [number] = "text" Comments used to specify sequence segments that have been published earlier. Note that there exists also a qualifier /citation for the feature key Gene. -------------------------------------------------------------------------------- Feature Key SOURCE - currently not supported Syntax see example (in masterfile header) Definition identifies the biological source of the specified span of the sequence. This key is mandatory. Every entry will have, as a minimum, a single source key spanning the entire sequence. More than one source key per sequence is permissable Mandatory qualifiers /organism="text" Optional qualifiers /cell_line = "text" /cell_type = "text" /chromosome = "text" /citation = [number] /clone = "text" /clone_lib="text" /cultivar = "text" /cyanelle = "text" /dev_stage = "text" /frequency = "text" /germline /halotype = "text" /lab_host = "text" /isolate = "text" /kinetoplast /label = feature_label /macronuclear /map = "text" /mitochodrion /note = "text" /plasmid = "text" /pop_variant = "text" /proviral /rearranged /sex = "text" /sequenced_mol = "text" /specific_host = "text" /strain = "text" /sub_clone = "text" /sub_species = "text" /sub_strain = "text" /tissue_lib = "text" /tissue_type = "text" /variety = "text" /plastid /nuclear /nucleomorph /shape* /complete /translate_table# Molecule scope any Comment multiple qualifiers (e.g., /clone=) m (go back to main)