Prediction and Validation of Operons in Uncharacterized Prokaryotes

by Morgan N. Price, Katherine H. Huang, Eric J. Alm, and Adam P. Arkin

Operons have not been studied extensively outside of Escherichia coli and Bacillus subtilis. To predict operons in other prokaryotes, we combine comparative genomics predictions of conserved operons with probabilistic models of distances between genes in the same operon. Unlike previous efforts, which apply distance models from known E. coli operons to other organisms, we infer genome-specific distance models from the comparative genomics predictions and their estimated error rates. We validate our predictions against known operons from E. coli and B. subtilis and against microarray data for six diverse prokaryotes, testing whether adjacent genes predicted to be in the same operon (or not) are coexpressed. Genome-specific distance models for the archaeon Halobacterium sp. NRC-1 and for Helicobacter pylori are significantly different from E. coli's distance model, and we use microarray data to confirm these differences. Furthermore, H. pylori has many operons, contrary to earlier reports, and Synechocystis sp. PCC 6803 has significant numbers of operons despite its unusual distance distribution. Finally, genomes with most of their genes on the leading strand of DNA replication have an even higher proportion of their multiple-gene transcripts on the leading strand. We use this observation to estimate the number of operons in strand-biased genomes and to improve our predictions significantly.

For further information, browse over 100 prokaryotic genomes, read a preprint, download predictions from our ftp site, or contact Eric Alm.


Visit the VIMSS comparative genome browser