GSC96, Annual Meeting of the Genetic Society of Canada.
University of Manitoba, Winnipeg, Man, 12-15 June, 1996.

Bioinformatics projects of the Organelle Genome and Database Program

G. Burger, R.J. Cedergren, B. Golding, M.W. Gray, M. Laskowska, B.F. Lang, C. Lemieux, T.G. Littlejohn, P. Rioux*, D. Sankoff, and M. Turmel.

Canadian Institute for Advanced Research *Département de Biochimie, Université de Montréal, Montréal, Québec, H3C 3J7, Canada.

We are systematically investigating mitochondrial DNAs (mtDNAs) from protistan eukaryotes. The data generated are used for comparative studies of gene structure, content, genome organization and expression and for making inferences about the evolutionary history of the mitochondrial genome. In the context of a genomics approach, an important advantage offered by organelles is the number of completely sequenced genomes already available and currently being determined. No larger collection of completely sequenced eukaryotic genomes currently exists. Detailed information about this and related research can be obtained via the World Wide Web at URL: The Organelle Genome Megasequencing Program (OGMP) shares many of the data management and analysis challenges experienced by other genomics groups; gene identification/inference of function, gene product structural predictions, and integration of molecular, genetic, biochemical and genome organization data. Specific aims of the OGMP informatics efforts include the development of software tools for accessing existing large networked databases; providing specialized databases to the wider scientific community; enhancement of existing genome analysis toolsets and development of additional tools for management of large sequence projects; and comparative genomics analysis tools (with an emphasis on phylogenetics). Key issues for the OGMP are the management and analysis of genomic data and their efficient, integrated, global distribution in a way that is readily accessible by researchers in the field.

The recently initiated Organelle Genome Database Project (GOBASE) addresses the present difficulty, indeed impossibility, of accessing all of the relevant information associated with organelles. Often, data are dispersed among a number of sources, are difficult to locate, are incomplete and may also contain errors. In their current disorganized state, organelle genomic data constitute a major underexploited source of information. GOBASE is intended to rectify this situation.

The first part of the presentation will focus on the GOBASE project, which is currently in a late prototype phase of development. The different components of the system and the ways in which they interact will be presented. Comments will be made on the particular design decisions that were taken and the informatics tools that were specifically developed for the project.

The second part of the presentation will provide an overview of the data management infrastructure created for the sequencing efforts of the OGMP. This includes the validation of sequence data; assembly using third-party software; and maintenance of sequence feature annotations throughout the numerous assembly cycles up to the finishing process and subsequent submission of the sequences to public data repositories.