Information Management in an Organelle Genome Megasequencing Project

Tim Littlejohn, Organelle Genome Megasequencing Program

tim@bch.umontreal.ca

Possibly the largest group users of molecular biology databases is the genome sequencing community. While a number of genomics projects are well underway, few are presently producing large quantities of comparative genomic data. The Organelle Genome Megasequencing Project (OGMP), centred at the University of Montreal, is unusual in that it generates large quantities of complete genomic sequence information (presently focused on mitochondrial genomes) suitable for comparative genomic analysis. The Megasequencing Unit of the OGMP currently completes between 3 to 6 entire genomes per year, and with predicted improvements in sequencing technology, this output should double within the next year.

The OGMP shares many of the data management challenges experienced by other genomic groups; gene identification/function inference, gene product structural predictions, and integration of molecular, genetic, biochemical, and genome organisation data, for example. Because of the phylogenetic emphasis of the project, OGMP has a major focus on comparative genomics. Hence, the information retrieval and data management requirements of the project are both wide and deep.

The Informatics Division of the OGMP (OGMP-ID) is concerned with data management and information retrieval from a wide variety of sources. Specific aims of the OGMP-ID include development of informatics tools for accessing existing large networked databases, providing specialist databases to the wider scientific community, enhancement of existing genome analysis toolsets and development of large sequence project management and comparative genomics analysis tools (with an emphasis on phylogenetics). Management of genomic data and its efficient, integrated, global distribution in a way that is readily accessible by researchers in the field are key issues for the OGMP-ID.

Current projects of the OGMP-ID include development of a number of tools for accessing diverse molecular sequence databases. These include:

In addition, integrated genomic databases (initially with an emphasis on organelle genomes and phylogenetics), image databases (including organisms of phylogenetic interest, with both macro and ultrastructural information) are under construction. Furthermore, through various Internet communication methods (Gopher, World Wide Web, etc.), the OGMP-ID participates in the global movement for unification of genomic data organisation, elimination of non- essential database and software development redundancy, and communication between researchers in the field of information management for genomics.

More information about this meeting.