GCB '01, German Conference on Bioinformatics, October 7-10, 2001 Braunschweig, Germany

Building Organelle Genome Databases using an Object-Oriented Approach

A. Barbasiewicz, L. Liu, B.F. Lang, N. Shimko, G. Burger

Université de Montréal, Department of Biochemistry, 2900 Blvd Edouard-Montpetit, Montreal, QC, Canada

GOBASE is a specialized biological relational database that integrates diverse data on organelles (mitochondria and chloroplasts) such as DNA and protein sequences, RNA secondary structure diagrams, taxonomic information and genetic maps of completely sequenced mitochondrial DNAs. GOBASE has been available for public access via the WWW since 1996 and originally housed only mitochondrial data, while chloroplast data have recently been included. Today, GOBASE includes over 80000 sequences. The major part of the data in GOBASE, i.e., sequence and taxonomic data, are being retrieved from the public sequence data repository at NCBI, and validated by experts in house. Maintaining a curated database comes with a very high labor cost. This is largely due to the fact that genomic sequences are being generated at an unprecedented rate and that records retrieved from public repositories contain annotation errors, nonstandard or misleading information that require correction.

Here, we present our efforts to substantially reduce manual data correction, increase automation and maximize code reusability by adopting an object-oriented technology. Our initial approach has been to use Unified Modeling Language (UML) to create a list of possible cases of data inconsistencies that we have found in GOBASE. Every case is regarded separately, an expert solution is devised and represented as a diagram. At that time the UML diagrams are used as templates for writing an object-oriented automation programs.