Biological databases in the genomic era

Gertraud Burger

Robert-Cedergren Centre (Centre de recherche en génomique et bioinformatique)

Université de Montréal, Canada.

 

Presented at the Symposium “La bioinformatique, du fundamental au appliqué”.

Aurillac, France, 14-15 June 2004.

 

 

Recent large-scale DNA sequencing projects worldwide are aiming at the exploration of the diversity of life in a systematic and comprehensive way. Among the projects ongoing in Canada is the Protist EST Program (PEP), which focuses on unicellular microbial eukaryotes, known as protists (protozoa). This enormous group of eukaryotes encompasses more evolutionary, ecological and probably biochemical diversity than the multicellular kingdoms of animals, plants and fungi combined. Another Canadian project is the Organelle Genome Megasequencing Program (OGMP), which is zooming in, also with a taxonomically broad perspective, on key organelles of the eukaryotic cell, mitochondria and chloroplasts. The large body of DNA sequences generated within these projects are organized and integrated in relational databases (PEPdb and GOBASE), together with related information, e.g., on enzymes, genetic maps, RNA secondary structures, organisms, taxonomy, as well as a gene ontology framework.

 

Obviously, the role of bioinformatics in this kind of endeavor is pivotal. The major challenges include:

·       Formalization of the data by a model that captures their essential features and the majority of instances of the relevant biological phenomena;

·       Organization of the data in a fashion easy to update, maintain and expand;

·       Powerful data consistency checking and validation capabilities;

·       Intuitive data representation and efficient retrieval.

 

Integrated biological databases such as the ones mentioned above provide us with a novel, global view of the massive diversity of data in life science. Such databases are bioinformatics tools in excelsis in that they constitute a unique resource for scientists to delve into most complex fundamental biological questions, such as the origin of cellular life and exchange of genetic material across domain boundaries.