Adjustment for Population Stratification via Principal Components in Association Analysis of Rare Variants

Yiwei Zhang, Weihua Guan, Wei Pan

Research output: Contribution to journalArticlepeer-review

26 Scopus citations


For unrelated samples, principal component (PC) analysis has been established as a simple and effective approach to adjusting for population stratification in association analysis of common variants (CVs, with minor allele frequencies MAF > 5%). However, it is less clear how it would perform in analysis of low-frequency variants (LFVs, MAF between 1% and 5%), or of rare variants (RVs, MAF < 5%). Furthermore, with next-generation sequencing data, it is unknown whether PCs should be constructed based on CVs, LFVs, or RVs. In this study, we used the 1000 Genomes Project sequence data to explore the construction of PCs and their use in association analysis of LFVs or RVs for unrelated samples. It is shown that a few top PCs based on either CVs or LFVs could separate two continental groups, European and African samples, but those based on only RVs performed less well. When applied to several association tests in simulated data with population stratification, using PCs based on either CVs or LFVs was effective in controlling Type I error rates, while nonadjustment led to inflated Type I error rates. Perhaps the most interesting observation is that, although the PCs based on LFVs could better separate the two continental groups than those based on CVs, the use of the former could lead to overadjustment in the sense of substantial power loss in the absence of population stratification; in contrast, we did not see any problem with the use of the PCs based on CVs in all our examples.

Original languageEnglish (US)
Pages (from-to)99-109
Number of pages11
JournalGenetic epidemiology
Issue number1
StatePublished - Jan 2013


  • 1000 Genomes Project
  • Association tests
  • Logistic regression
  • Next-generation sequencing
  • SNP
  • SSU test


Dive into the research topics of 'Adjustment for Population Stratification via Principal Components in Association Analysis of Rare Variants'. Together they form a unique fingerprint.

Cite this