Adjusting for population stratification in a fine scale with principal components and sequencing data

Yiwei Zhang, Xiaotong T Shen, Wei Pan

Research output: Contribution to journalArticlepeer-review

15 Scopus citations


Population stratification is of primary interest in genetic studies to infer human evolution history and to avoid spurious findings in association testing. Although it is well studied with high-density single nucleotide polymorphisms (SNPs) in genome-wide association studies (GWASs), next-generation sequencing brings both new opportunities and challenges to uncovering population structures in finer scales. Several recent studies have noticed different confounding effects from variants of different minor allele frequencies (MAFs). In this paper, using a low-coverage sequencing dataset from the 1000 Genomes Project, we compared a popular method, principal component analysis (PCA), with a recently proposed spectral clustering technique, called spectral dimensional reduction (SDR), in detecting and adjusting for population stratification at the level of ethnic subgroups. We investigated the varying performance of adjusting for population stratification with different types and sets of variants when testing on different types of variants. One main conclusion is that principal components based on all variants or common variants were generally most effective in controlling inflations caused by population stratification; in particular, contrary to many speculations on the effectiveness of rare variants, we did not find much added value with the use of only rare variants. In addition, SDR was confirmed to be more robust than PCA, especially when applied to rare variants.

Original languageEnglish (US)
Pages (from-to)787-801
Number of pages15
JournalGenetic epidemiology
Issue number8
StatePublished - Dec 2013


  • 1000 Genomes Project
  • Association testing
  • Common variants
  • Principal component analysis
  • Rare variants
  • Spectral analysis

Fingerprint Dive into the research topics of 'Adjusting for population stratification in a fine scale with principal components and sequencing data'. Together they form a unique fingerprint.

Cite this