Accurate and Efficient KIR Gene and Haplotype Inference From Genome Sequencing Reads With Novel K-mer Signatures

David Roe, Rui Kuang

Research output: Contribution to journalArticlepeer-review

Abstract

The killer-cell immunoglobulin-like receptor (KIR) proteins evolve to fight viruses and mediate the body’s reaction to pregnancy. These roles provide selection pressure for variation at both the structural/haplotype and base/allele levels. At the same time, the genes have evolved relatively recently by tandem duplication and therefore exhibit very high sequence similarity over thousands of bases. These variation-homology patterns make it impossible to interpret KIR haplotypes from abundant short-read genome sequencing data at population scale using existing methods. Here, we developed an efficient computational approach for in silico KIR probe interpretation (KPI) to accurately interpret individual’s KIR genes and haplotype-pairs from KIR sequencing reads. We designed synthetic 25-base sequence probes by analyzing previously reported haplotype sequences, and we developed a bioinformatics pipeline to interpret the probes in the context of 16 KIR genes and 16 haplotype structures. We demonstrated its accuracy on a synthetic data set as well as a real whole genome sequences from 748 individuals from The Genome of the Netherlands (GoNL). The GoNL predictions were compared with predictions from SNP-based predictions. Our results show 100% accuracy rate for the synthetic tests and a 99.6% family-consistency rate in the GoNL tests. Agreement with the SNP-based calls on KIR genes ranges from 72%–100% with a mean of 92%; most differences occur in genes KIR2DS2, KIR2DL2, KIR2DS3, and KIR2DL5 where KPI predicts presence and the SNP-based interpretation predicts absence. Overall, the evidence suggests that KPI’s accuracy is 97% or greater for both KIR gene and haplotype-pair predictions, and the presence/absence genotyping leads to ambiguous haplotype-pair predictions with 16 reference KIR haplotype structures. KPI is free, open, and easily executable as a Nextflow workflow supported by a Docker environment at https://github.com/droeatumn/kpi.

Original languageEnglish (US)
Article number583013
JournalFrontiers in immunology
Volume11
DOIs
StatePublished - Nov 26 2020

Bibliographical note

Funding Information:
This study makes use of data generated by the Genome of the Netherlands Project. A full list of the investigators is available from www.nlgenome.nl. Funding for the project was provided by the Netherlands Organization for Scientific Research under award number 184021007, dated July 9, 2009 and made available as a Rainbow Project of the Biobanking and Biomolecular Research Infrastructure Netherlands (BBMRI-NL). The sequencing was carried out in collaboration with the Beijing Institute for Genomics (BGI). The samples were provided by The LifeLines Cohort Study (21), and generation and management of GWAS genotype data for it, is supported by the Netherlands Organization of Scientific Research (NWO, grant 175.010.2007.006), the Dutch government?s Economic Structure Enhancing Fund (FES), the Ministry of Economic Affairs, the Ministry of Education, Culture and Science, the Ministry for Health, Welfare and Sports, the Northern Netherlands Collaboration of Provinces (SNN), the Province of Groningen, the University Medical Center Groningen, the University of Groningen, the Dutch Kidney Foundation and Dutch Diabetes Research Foundation. Thanks to Christian Hammer and his team from Genentech for assistance testing and debugging the application. Also, thanks to Cynthia Vierra-Green and Martin Maiers from the Center for International Blood and Marrow Transplant Research (CIMBTR) as well as Julia Udell from Mayo Clinic and the University of Minnesota for KIR consultation and consolation.

Funding Information:
This study makes use of data generated by the Genome of the Netherlands Project. A full list of the investigators is available from www.nlgenome.nl. Funding for the project was provided by the Netherlands Organization for Scientific Research under award number 184021007, dated July 9, 2009 and made available as a Rainbow Project of the Biobanking and Biomolecular Research Infrastructure Netherlands (BBMRI-NL). The sequencing was carried out in collaboration with the Beijing Institute for Genomics (BGI). The samples were provided by The LifeLines Cohort Study (21), and generation and management of GWAS genotype data for it, is supported by the Netherlands Organization of Scientific Research (NWO, grant 175.010.2007.006), the Dutch government’s Economic Structure Enhancing Fund (FES), the Ministry of Economic Affairs, the Ministry of Education, Culture and Science, the Ministry for Health, Welfare and Sports, the Northern Netherlands Collaboration of Provinces (SNN), the Province of Groningen, the University Medical Center Groningen, the University of Groningen, the Dutch Kidney Foundation and Dutch Diabetes Research Foundation.

Publisher Copyright:
© Copyright © 2020 Roe and Kuang.

Keywords

  • genotype
  • haplotype
  • interpretation
  • killer-cell immunoglobulin-like receptor
  • natural killer
  • whole genome sequencing (WGS)

PubMed: MeSH publication types

  • Journal Article
  • Research Support, Non-U.S. Gov't

Fingerprint

Dive into the research topics of 'Accurate and Efficient KIR Gene and Haplotype Inference From Genome Sequencing Reads With Novel K-mer Signatures'. Together they form a unique fingerprint.

Cite this