Operons have not been studied extensively outside of Escherichia coli
and Bacillus subtilis
To predict operons in other prokaryotes,
we combine comparative genomics predictions of conserved operons
with probabilistic models of distances between genes in the same operon.
Unlike previous efforts, which apply
distance models from known E. coli
operons to other organisms,
we infer genome-specific distance models from
the comparative genomics predictions and their estimated error rates.
We validate our predictions against known operons from E. coli
and B. subtilis
and against microarray data for six diverse prokaryotes,
testing whether adjacent genes predicted to be in the same operon (or not) are coexpressed.
Genome-specific distance models for the archaeon Halobacterium sp. NRC-1
and for Helicobacter pylori
different from E. coli
's distance model, and we use microarray data to confirm these differences.
Furthermore, H. pylori
has many operons,
contrary to earlier reports, and Synechocystis sp. PCC 6803
has significant numbers of operons despite its unusual distance
Finally, genomes with most of their genes on the leading strand
of DNA replication have an even higher proportion of their multiple-gene transcripts on the leading strand. We
use this observation to estimate the number of operons in strand-biased genomes and to
improve our predictions significantly.
For further information,
browse over 100 prokaryotic genomes,
read a preprint,
download predictions from our ftp site,
or contact Eric Alm.