De-novo reverse-engineering of genome-scale regulatory networks is a fundamental problem of biological and translational research. One of the major obstacles in developing and evaluating approaches for de-novo gene network reconstruction is the absence of high-quality genome-scale gold-standard networks of direct regulatory interactions. To establish a foundation for assessing the accuracy of de-novo gene network reverse-engineering, we constructed high-quality genomescale gold-standard networks of direct regulatory interactions in Saccharomyces cerevisiae that incorporate binding and gene knockout data. Then we used 7 performance metrics to assess accuracy of 18 statistical association-based approaches for de-novo network reverse-engineering in 13 different datasets spanning over 4 data types. We found that most reconstructed networks had statistically significant accuracies. We also determined which statistical approaches and datasets/data types lead to networks with better reconstruction accuracies. While we found that de-novo reverseengineering of the entire network is a challenging problem, it is possible to reconstruct sub-networks around some transcription factors with good accuracy. The latter transcription factors can be identified by assessing their connectivity in the inferred networks. Overall, this study provides the gene network reverse-engineering community with a rigorous assessment of the accuracy of S. cerevisiae gene network reconstruction and variability in performance of various approaches for learning both the entire network and sub-networks around transcription factors.
Bibliographical noteFunding Information:
This research was supported in part by the NIH grants 1UL1 RR029893 from the National Center for Research Resources (A.S.), R01 LM011179-01A1 from the National Library of Medicine (A.S. and S.M.), R01 GM107466 from the National Institute of General Medical Sciences (D.G.); by the NSF grant MCB-1244219 (D.G.); by the grants from the Netherlands Organization of Scientific Research (NWO) 863.07.007 (P.K.) and 864.11.010 (P.K.); and by a Dupont Young Professor award (D.G.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The authors acknowledge Frank C.P. Holstege for providing targeted perturbation data that enables construction of the gold-standard networks. The authors are also grateful to Efstratios Efstathiadis and Eric Peskin for the help with providing access and running experiments on the high performance computing facility at New York University Langone Medical Center.
© 2014 Statnikov et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.