Adjusting for population stratification in a fine scale with principal components and sequencing data

Yiwei Zhang; Xiaotong T Shen; Wei Pan

doi:10.1002/gepi.21764

Adjusting for population stratification in a fine scale with principal components and sequencing data

Yiwei Zhang, Xiaotong T Shen, Wei Pan

Research output: Contribution to journal › Article › peer-review

19 Scopus citations

Abstract

Population stratification is of primary interest in genetic studies to infer human evolution history and to avoid spurious findings in association testing. Although it is well studied with high-density single nucleotide polymorphisms (SNPs) in genome-wide association studies (GWASs), next-generation sequencing brings both new opportunities and challenges to uncovering population structures in finer scales. Several recent studies have noticed different confounding effects from variants of different minor allele frequencies (MAFs). In this paper, using a low-coverage sequencing dataset from the 1000 Genomes Project, we compared a popular method, principal component analysis (PCA), with a recently proposed spectral clustering technique, called spectral dimensional reduction (SDR), in detecting and adjusting for population stratification at the level of ethnic subgroups. We investigated the varying performance of adjusting for population stratification with different types and sets of variants when testing on different types of variants. One main conclusion is that principal components based on all variants or common variants were generally most effective in controlling inflations caused by population stratification; in particular, contrary to many speculations on the effectiveness of rare variants, we did not find much added value with the use of only rare variants. In addition, SDR was confirmed to be more robust than PCA, especially when applied to rare variants.

Original language	English (US)
Pages (from-to)	787-801
Number of pages	15
Journal	Genetic epidemiology
Volume	37
Issue number	8
DOIs	https://doi.org/10.1002/gepi.21764
State	Published - Dec 2013

Keywords

1000 Genomes Project
Association testing
Common variants
Principal component analysis
Rare variants
Spectral analysis

Access

10.1002/gepi.21764

OpenUrl availability

Full text

Cite this

@article{68227d9cdd874a68bd7aa1b82e1802d9,

title = "Adjusting for population stratification in a fine scale with principal components and sequencing data",

abstract = "Population stratification is of primary interest in genetic studies to infer human evolution history and to avoid spurious findings in association testing. Although it is well studied with high-density single nucleotide polymorphisms (SNPs) in genome-wide association studies (GWASs), next-generation sequencing brings both new opportunities and challenges to uncovering population structures in finer scales. Several recent studies have noticed different confounding effects from variants of different minor allele frequencies (MAFs). In this paper, using a low-coverage sequencing dataset from the 1000 Genomes Project, we compared a popular method, principal component analysis (PCA), with a recently proposed spectral clustering technique, called spectral dimensional reduction (SDR), in detecting and adjusting for population stratification at the level of ethnic subgroups. We investigated the varying performance of adjusting for population stratification with different types and sets of variants when testing on different types of variants. One main conclusion is that principal components based on all variants or common variants were generally most effective in controlling inflations caused by population stratification; in particular, contrary to many speculations on the effectiveness of rare variants, we did not find much added value with the use of only rare variants. In addition, SDR was confirmed to be more robust than PCA, especially when applied to rare variants.",

keywords = "1000 Genomes Project, Association testing, Common variants, Principal component analysis, Rare variants, Spectral analysis",

author = "Yiwei Zhang and Shen, {Xiaotong T} and Wei Pan",

year = "2013",

month = dec,

doi = "10.1002/gepi.21764",

language = "English (US)",

volume = "37",

pages = "787--801",

journal = "Genetic epidemiology",

issn = "0741-0395",

publisher = "Wiley-Liss Inc.",

number = "8",

}

TY - JOUR

T1 - Adjusting for population stratification in a fine scale with principal components and sequencing data

AU - Zhang, Yiwei

AU - Shen, Xiaotong T

AU - Pan, Wei

PY - 2013/12

Y1 - 2013/12

N2 - Population stratification is of primary interest in genetic studies to infer human evolution history and to avoid spurious findings in association testing. Although it is well studied with high-density single nucleotide polymorphisms (SNPs) in genome-wide association studies (GWASs), next-generation sequencing brings both new opportunities and challenges to uncovering population structures in finer scales. Several recent studies have noticed different confounding effects from variants of different minor allele frequencies (MAFs). In this paper, using a low-coverage sequencing dataset from the 1000 Genomes Project, we compared a popular method, principal component analysis (PCA), with a recently proposed spectral clustering technique, called spectral dimensional reduction (SDR), in detecting and adjusting for population stratification at the level of ethnic subgroups. We investigated the varying performance of adjusting for population stratification with different types and sets of variants when testing on different types of variants. One main conclusion is that principal components based on all variants or common variants were generally most effective in controlling inflations caused by population stratification; in particular, contrary to many speculations on the effectiveness of rare variants, we did not find much added value with the use of only rare variants. In addition, SDR was confirmed to be more robust than PCA, especially when applied to rare variants.

AB - Population stratification is of primary interest in genetic studies to infer human evolution history and to avoid spurious findings in association testing. Although it is well studied with high-density single nucleotide polymorphisms (SNPs) in genome-wide association studies (GWASs), next-generation sequencing brings both new opportunities and challenges to uncovering population structures in finer scales. Several recent studies have noticed different confounding effects from variants of different minor allele frequencies (MAFs). In this paper, using a low-coverage sequencing dataset from the 1000 Genomes Project, we compared a popular method, principal component analysis (PCA), with a recently proposed spectral clustering technique, called spectral dimensional reduction (SDR), in detecting and adjusting for population stratification at the level of ethnic subgroups. We investigated the varying performance of adjusting for population stratification with different types and sets of variants when testing on different types of variants. One main conclusion is that principal components based on all variants or common variants were generally most effective in controlling inflations caused by population stratification; in particular, contrary to many speculations on the effectiveness of rare variants, we did not find much added value with the use of only rare variants. In addition, SDR was confirmed to be more robust than PCA, especially when applied to rare variants.

KW - 1000 Genomes Project

KW - Association testing

KW - Common variants

KW - Principal component analysis

KW - Rare variants

KW - Spectral analysis

UR - http://www.scopus.com/inward/record.url?scp=84887609707&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84887609707&partnerID=8YFLogxK

U2 - 10.1002/gepi.21764

DO - 10.1002/gepi.21764

M3 - Article

C2 - 24123217

AN - SCOPUS:84887609707

SN - 0741-0395

VL - 37

SP - 787

EP - 801

JO - Genetic epidemiology

JF - Genetic epidemiology

IS - 8

ER -

Adjusting for population stratification in a fine scale with principal components and sequencing data

Abstract

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this