Background: Genome-wide association studies involve detecting association between millions of genetic variants and a trait, which typically use univariate regression to test association between each single variant and the phenotype. Alternatively, Lasso penalized regression allows one to jointly model the relationship between all genetic variants and the phenotype. However, it is unclear how to best conduct inference on the individual Lasso coefficients, especially in high-dimensional settings. Methods: We consider six methods for testing the Lasso coefficients: two permutation (Lasso-Ayers, Lasso-PL) and one analytic approach (Lasso-AL) to select the penalty parameter for type-1-error control, residual bootstrap (Lasso-RB), modified residual bootstrap (Lasso-MRB), and a permutation test (Lasso-PT). Methods are compared via simulations and application to the Minnesota Center for Twins and Family Study. Results: We show that for finite sample sizes with increasing number of null predictors, Lasso-RB, Lasso-MRB, and Lasso-PT fail to be viable methods of inference. However, Lasso-PL and Lasso-AL remain fast and powerful tools for conducting inference with the Lasso, even in high-dimensions. Conclusion: Our results suggest that the proposed permutation selection procedure (Lasso-PL) and the analytic selection method (Lasso-AL) are fast and powerful alternatives to the standard univariate analysis in genome-wide association studies.
Bibliographical noteFunding Information:
This work was carried out in part using computing resources at the University of Minnesota Supercomputing Institute. The MCTFR Study is a collaborative study supported by DA13240, DA05147, DA13240, AA09367, AA09367, AA11886, MH066140. We would like to thank the Editor and the reviewers for their helpful comments to improve this paper. This research was supported by the NIH grant DA033958 (PI: Saonli Basu), NIH grant T32GM108557 (PI: Wei Pan).
© 2017 The Author(s).