The genetic similarity between Mycobacterium avium subsp. paratuberculosis and other mycobacterial species has confounded the development of M. avium subsp. paratuberculosis-specific diagnostic reagents. Random shotgun sequencing of the M. avium subsp. paratuberculosis genome in our laboratories has shown >98% sequence identity with Mycobacterium avium subsp. avium in some regions. However, an in silico comparison of the largest annotated M. avium subsp. paratuberculosis contigs, totaling 2,658,271 bp, with the unfinished M. avium subsp. avium genome has revealed 27 predicted M. avium subsp. paratuberculosis coding sequences that do not align with M. avium subsp. avium sequences. BLASTP analysis of the 27 predicted coding sequences (genes) shows that 24 do not match sequences in public sequence databases, such as GenBank. These novel sequences were examined by PCR amplification with genomic DNA from eight mycobacterial species and ten independent isolates of M. avium subsp. paratuberculosis. From these analyses, 21 genes were found to be present in all M. avium subsp. paratuberculosis isolates and absent from all other mycobacterial species tested. One region of the M. avium subsp. paratuberculosis genome contains a cluster of eight genes, arranged in tandem, that is absent in other mycobacterial species. This region spans 4.4 kb and is separated from other predicted coding regions by 1,408 bp upstream and 1,092 bp downstream. The gene upstream of this eight-gene cluster has strong similarity to mycobacteriophage integrase sequences. The GC content of this 4.4-kb region is 66%, which is similar to the rest of the genome, indicating that this region was not horizontally acquired recently. Southern hybridization analysis confirmed that this gene cluster is present only in M. avium subsp. paratuberculosis. Collectively, these studies suggest that a genomics approach will help in identifying novel M. avium subsp. paratuberculosis genes as candidate diagnostic sequences.