The Organelle Genome Database Project (GOBASE)

MIMDB-95, Meeting on Interconnection of Molecular Biology Databases.
St.Louis, Miss. USA, June, 1995.

The Organelle Genome Database Project (GOBASE)

Timothy G. Littlejohn, Gertraud Burger, Michael W. Gray* and Maria Korab-Laskowaska, B. Franz Lang.
Departement de biochimie, Universite de Montreal, Montreal, Quebec, Canada
*Department of Biochemistry, Dalhousie University, Halifax Nova Scotia, Canada

Introduction

Why Organelle Genomes?

Organelles (mitochondria and chloroplasts) are of interest for several reasons, including their:

bacterial origins
relationship to the evolution of the nuclear genome
central role in eukaryotic cell energy production
involvement in human disease
utility as population markers

completely sequenced genomes already available

"Organelle Genome Megasequencing Program", OGMP

Comparative Genomics

The ability to examine multiple genomes of the same type in detail has several advantages: it permits large-scale genome changes to be identified and allows observation of genome evolution as opposed to single-gene evolution. The dataset of organelle genomes is large enough to permit meaningful comparative genome analysis at the DNA sequence level, something that cannot yet be said of any other cellular genome.

Why an Organelle Genome Database?

At present it is difficult or impossible for researchers to access all the relevant information associated with an organelle genome. Data are dispersed among a number of sources, much of the relevant information is difficult to access, and it can be a major effort for a biologist even to locate these data sources. Furthermore there are usually limited or no established links between the data sources (i.e. there is no easy way to "jump" from one data source to the next, e.g., from a genetic map to the underlying sequences of the genes represented in that map to the phenotypic effects of mutations in those genes). It is even more difficult to attempt to perform cross genome comparisons of this type. Furthermore, the data sets may be incomplete or not represented at all. In many cases, once the data are successfully retrieved, one discovers that they contain errors, errors that may be hard to identify and rectify in the underlying data source. In its present disorganised state, organelle genomic data constitutes a major underexploited source of information. GOBASE will rectify this situation.

Goals of GOBASE

1. Creation of an integrated organelle genome database

GOBASE will provide a data repository upon which the genomics research community can draw and where it can deposit genomic information. The GOBASE project is unique in that it will bring together information on multiple organelle genomes (of which there are more than 40 published complete organelle genome sequences, a number that is growing rapidly.

GOBASE will be populated with a large array of data types including:

primary sequences (nucleotide and amino acid)
multiple sequence alignments
genetic and physical maps
RNA secondary structures
genotype and phenotype data
sequence polymorphisms
biochemical & physiological data
organismal information

Much of the data will be supplied from members of the federation of molecular biology databases (termed "data suppliers"). However, GOBASE will also be a contributor to this community by providing:

a unified and exhaustive view of organelle data
a source of "value added" data (i.e. with errors corrected and additional features added)
a supply of novel data (such as the organelle RNA secondary structure data to be generated by GOBASE's RNA Structure Unit)

Thus GOBASE has the data integration challenge of combining a large number of data types from a variety of data sources for a number of different organisms. In doing so, GOBASE will be a model for multiple genome comparative analysis. It will be adaptable and able to accommodate larger, more complicated genomes (e.g. those of plastids and bacteria) once a greater number of these genomes have been sequenced.

2. Generation of organelle RNA secondary structures. The GOBASE project will generate RNA secondary structures for inclusion in the database, as these data are often lacking. This repository will supplement and eventually supplant the traditional (printed) means of distributing secondary structure information.

3. Simplified data submission. GOBASE will be Internet-accessible and open to the public. The structure of the database will also permit submission of confidential data and allow password-protected access to that information. Users will be encouraged to submit material prior to publication, which will also enable contributors to see their new data at an early stage in the context of the integrated and enriched collections.

4. Bioinformatics research. GOBASE is a database designed to handle sequence and other information pertaining to multiple genomes of a given type. Software tools and methodologies are being developed to facilitate genomic and comparative genome analysis of the data stored in the database. These tools will permit groups involved in organelle research to make novel discoveries from the assembled data and will also provide fertile ground for novel bioinformatics research.

GOBASE Implementation

The GOBASE Server. GOBASE is being built using proven relational database software technology (SYBASE) with the help of Extended Entity-Relationship Data Management Tools. The server is running on a Sun SPARCstation running Solaris 2.4.

Populating the database. GOBASE will be both a user and contributor to the federation of molecular biology databases accessible through the Internet. It will be populated by acquiring data from various data suppliers, as well as through direct submissions from Organelle genome groups such as the OGMP. A number of tools are under development that will permit identification of organelle data within these supplied collections, and that will aid in collating these data, and placing them in their appropriate context. In the process, expert biologists will perform quality control on the data, correcting errors adding logical links to other data, as well as supplementing the feature annotations of the data.

Querying the database. Queries to GOBASE will be supported through WWW forms with interfaces built using the Web/Genera system.

GOBASE- A team player in the federation. Clearly GOBASE will rely heavily on many other members of the federation of molecular biology and other databases for much of the raw material that will populate it. On the other hand, GOBASE will also be an active contributor to this community by providing value-added, error-free and novel organelle data. The GOBASE project members look forward to working with other members of the community to facilitate the exchange of information and technologies that will continue the development of the federation.

Acknowledgements

Canadian Genome Analysis & Technology program (CGAT)

This abstract has been submitted to the Meeting on Interconnection of Molecular Biology Databases