A 17-dimensional vector named the proteome vector is defined to represent an organism. The components of the vector reflect the relative contents of protein-encoding genes of the 17 cluster of orthologous groups of proteins (COGs) classes in the whole genome of the relevant organism. Based on the definition of this proteome vector, the fuzzy clustering of 36 completely sequenced organisms (8 archaea, 24 bacteria, and 4 eukarya) was performed and a proteome tree was constructed. Our results show that (1) the 36 organisms can be 100% correctly classified into three clusters corresponding to the three primary kingdoms, (2) our proteome tree is remarkably similar to that derived from 16S rRNA, and (3) the chromosomes and/or plasmids belonging to the same organism have very similar gene composition. Based on these results, we argue that the 17-dimensional proteome vector could be a good criterion for clustering approaches and to a large extent reveals the phylogenetic properties of organisms; the Three Primary Kingdoms Hypothesis is trustworthy although the existence of lateral gene transfer (LGT) brings controversy to the construction of the "universal tree of life."
Bibliographical noteFunding Information:
The study was supported by the Grants 39890070, 19890380, and 39993420 from the China National Foundation of Science (CNFS), by the Grants KSCX2-2-07 and KJCX1-08 from the Chinese Academy of Science, and by a special grant the from the Science and Technology Committee of Beijing.