Contamination is a critical issue in high-throughput metagenomic studies, yet progress toward a comprehensive solution has been limited. We present SourceTracker, a Bayesian approach to estimate the proportion of contaminants in a given community that come from possible source environments. We applied SourceTracker to microbial surveys from neonatal intensive care units (NICUs), offices and molecular biology laboratories, and provide a database of known contaminants for future testing.
Bibliographical noteFunding Information:
We acknowledge funding from US National Institutes of Health (R01HG4872, R01HG4866, U01HL098957 and P01DK78669), the Crohn’s and Colitis Foundation of America and the Howard Hughes Medical Institute, and B. Prithiviraj for helpful insight into previous related work.