Operons have not been studied extensively outside of
Escherichia coli and
Bacillus subtilis.
To predict operons in other prokaryotes,
we combine comparative genomics predictions of conserved operons
with probabilistic models of distances between genes in the same operon.
Unlike previous efforts, which apply
distance models from known
E. coli operons to other organisms,
we infer genome-specific distance models from
the comparative genomics predictions and their estimated error rates.
We validate our predictions against known operons from
E. coli and
B. subtilis
and against microarray data for six diverse prokaryotes,
testing whether adjacent genes predicted to be in the same operon (or not) are coexpressed.
Genome-specific distance models for the archaeon
Halobacterium sp. NRC-1
and for
Helicobacter pylori are significantly
different from
E. coli's distance model, and we use microarray data to confirm these differences.
Furthermore,
H. pylori has many operons,
contrary to earlier reports, and
Synechocystis sp. PCC 6803
has significant numbers of operons despite its unusual distance
distribution.
Finally, genomes with most of their genes on the leading strand
of DNA replication have an even higher proportion of their multiple-gene transcripts on the leading strand. We
use this observation to estimate the number of operons in strand-biased genomes and to
improve our predictions significantly.
For further information,
browse over 100 prokaryotic genomes,
read a preprint,
download predictions from our ftp site,
or contact Eric Alm.