The construction of genome-wide mutant collections has enabled high-throughput, high-dimensional quantitative characterization of gene and chemical function, particularly via genetic and chemical–genetic interaction experiments. As the throughput of such experiments increases with improvements in sequencing technology and sample multiplexing, appropriate tools must be developed to handle the large volume of data produced. Here, we describe how to apply our approach to high-throughput, fitness-based profiling of pooled mutant yeast collections using the BEAN-counter software pipeline (Barcoded Experiment Analysis for Next-generation sequencing) for analysis. The software has also successfully processed data from Schizosaccharomyces pombe, Escherichia coli, and Zymomonas mobilis mutant collections. We provide general recommendations for the design of large-scale, multiplexed barcode sequencing experiments. The procedure outlined here was used to score interactions for ~4 million chemical-by-mutant combinations in our recently published chemical–genetic interaction screen of nearly 14,000 chemical compounds across seven diverse compound collections. Here we selected a representative subset of these data on which to demonstrate our analysis pipeline. BEAN-counter is open source, written in Python, and freely available for academic use. Users should be proficient at the command line; advanced users who wish to analyze larger datasets with hundreds or more conditions should also be familiar with concepts in analysis of high-throughput biological data. BEAN-counter encapsulates the knowledge we have accumulated from, and successfully applied to, our multiplexed, pooled barcode sequencing experiments. This protocol will be useful to those interested in generating their own high-dimensional, quantitative characterizations of gene or chemical function in a high-throughput manner.
Bibliographical noteFunding Information:
S.W.S. thanks B. VanderSluis for proofreading the manuscript and testing the software and also A. Becker at the University of Minnesota Genomics Center for discussions regarding amplicon sequencing issues. This work was supported by RIKEN (http://www.riken.jp/en/) Strategic Programs for R&D, the National Institutes of Health (https://www.nih.gov/; R01HG005084, R01GM104975), and the National Science Foundation (https://www.nsf.gov/; DBI 0953881). S.W.S. was supported by an NSF Graduate Research Fellowship (00039202), an NIH Biotechnology training grant (T32GM008347), and a one-year fellowship from the University of Minnesota Bioinformatics and Computational Biology (BICB) Graduate Program (https://r.umn.edu/academics-research/graduate-programs/bicb). S.C.L. and J.S.P. were supported by a RIKEN Foreign Postdoctoral Research Fellowship. S.C.L. was supported by a RIKEN CSRS (http://www.csrs.riken.jp/ en/) Research Topics for Cooperative Projects Award (201601100228) and a RIKEN FY2017 Incentive Research Projects Grant. H.N.W. was supported by a one-year BICB fellowship from the University of Minnesota. C.B. was supported by JSPS KAKENHI (https://www. jsps.go.jp/english/e-grants/) grant no. 15H04483. C.L.M. and C.B. are fellows in the Canadian Institute for Advanced Research (CIFAR, https://www.cifar.ca/) Genetic Networks Program. Computing resources and data storage services were partially provided by the Minnesota Supercomputing Institute and the UMN Office of Information Technology, respectively. Software licensing services were provided by the UMN Office for Technology Commercialization. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.