Due to issues of practicality and confidentiality of genomic data sharing on a large scale, typically only meta-or mega-analyzed genome-wide association study (GWAS) summary data, not individual-level data, are publicly available. Reanalyses of such GWAS summary data for a wide range of applications have become more and more common and useful, which often require the use of an external reference panel with individual-level genotypic data to infer linkage disequilibrium (LD) among genetic variants. However, with a small sample size in only hundreds, as for the most popular 1000 Genomes Project European sample, estimation errors for LD are not negligible, leading to often dramatically increased numbers of false positives in subsequent analyses of GWAS summary data. To alleviate the problem in the context of association testing for a group of SNPs, we propose an alternative estimator of the covariance matrix with an idea similar to multiple imputation. We use numerical examples based on both simulated and real data to demonstrate the severe problem with the use of the 1000 Genomes Project reference panels, and the improved performance of our new approach.
Bibliographical noteFunding Information:
The authors thank the reviewers for many helpful comments and Chong Wu for help with the data. We downloaded the LHS data from dbGaP. This research was supported by National institutes of Health grants R21 AG-057038, R01 HL-116720, R01 GM-113250, and R01 HL-105397; by National Science Foundation grant DMS1711226, and by the Minnesota Supercomputing Institute.
© 2018, Genetics Society of America. All rights reserved.
- 1000 Genomes Project
- COJO analysis
- Gene-based testing
- Multiple SNPs
- Multiple imputation
- Type I error
- Wald test