Genome Canada 1996, Grantee Meeting of the Canadian Genome Analysis and Technology Program.
Ottawa, Que, 14-16 June, 1996.

MANAGEMENT AND ANALYSIS OF SEQUENCE DATA FROM MULTIPLE ORGANELLE GENOMES.

G. Burger*, M. Laskowska, P. Rioux, T.G. Littlejohn, R. Cedergren, B. Golding,C. Lemieux, D. Sankoff, M. Turmel, M.W. Gray and B.F. Lang.(Presented by *)

Canadian Institute for Advanced Research, Departement de Biochimie, Universite de Montreal, Montreal, Quebec H3C 3J7, Canada.
We are systematically investigating mitochondrial DNAs (mtDNAs) from protistan eukaryotes. The data generated are used for comparative studies of gene structure, content, genome organization and expression, and for making inferences about the evolutionary history of the mitochondrial genome. In the context of a genomics approach, an important advantage offered by organelles is the number of genome sequences already available and currently being determined: no larger collection of completely sequenced eukaryotic genomes currently exists. Detailed information about this and related research can be obtained via the World Wide Web at URL http://megasun.bch.umontreal.ca/.

The Organelle Genome Megasequencing Program (OGMP) shares many of the data management and analysis challenges experienced by other genomics groups: gene identification/inference of function, gene product structural predictions, and integration of molecular, genetic, biochemical and genome organization data. Specific aims of the OGMP informatics efforts include the development of software tools for accessing existing large networked databases; enhancement of existing genome analysis toolsets; development of additional tools for management of large sequencing projects; and provision of comparative genomics analysis tools, with an emphasis on phylogenetics. An overview of the data management infrastructure will be presented. This includes the validation of sequence data; assembly using third- party software; and maintenance of sequence feature annotations throughout the numerous assembly cycles up to the finishing process and subsequent submission of the sequences to public data repositories.

The recently initiated Organelle Genome Database Project (GOBASE) addresses the present difficulty, indeed impossibility, of accessing all of the relevant information associated with organelles. Data are often dispersed among a number of sources, are difficult to locate, are incomplete and may also contain errors. In their current disorganized state, organelle genomic data constitute a major underexploited source of information. GOBASE is intended to rectify this situation. This project is currently in a late prototype phase of development, and different components of the system and the ways in which they interact will be presented. Comments will be made about the particular design decisions that were taken and the informatics tools that were specifically developed for the project. Supported by MRC Canada (SP-34) and CGAT (GO-12323 and GO-12984).