Background: Proteomic approaches allow measurement of thousands of proteins in a single specimen, which can accelerate biomarker discovery. However, applying these technologies to massive biobanks is not currently feasible because of the practical barriers and costs of implementing such assays at scale. To overcome these challenges, we used a "virtual proteomic" approach, linking genetically predicted protein levels to clinical diagnoses in >40 000 individuals. Methods: We used genome-wide association data from the Framingham Heart Study (n=759) to construct genetic predictors for 1129 plasma protein levels. We validated the genetic predictors for 268 proteins and used them to compute predicted protein levels in 41 288 genotyped individuals in the Electronic Medical Records and Genomics (eMERGE) cohort. We tested associations for each predicted protein with 1128 clinical phenotypes. Lead associations were validated with directly measured protein levels and either low-density lipoprotein cholesterol or subclinical atherosclerosis in the MDCS (Malmö Diet and Cancer Study; n=651). Results: In the virtual proteomic analysis in eMERGE, 55 proteins were associated with 89 distinct diagnoses at a false discovery rate q<0.1. Among these, 13 associations involved lipid (n=7) or atherosclerosis (n=6) phenotypes. We tested each association for validation in MDCS using directly measured protein levels. At Bonferroni-adjusted significance thresholds, levels of apolipoprotein E isoforms were associated with hyperlipidemia, and circulating C-type lectin domain family 1 member B and platelet-derived growth factor receptor-β predicted subclinical atherosclerosis. Odds ratios for carotid atherosclerosis were 1.31 (95% CI, 1.08-1.58; P=0.006) per 1-SD increment in C-type lectin domain family 1 member B and 0.79 (0.66-0.94; P=0.008) per 1-SD increment in platelet-derived growth factor receptor-β. Conclusions: We demonstrate a biomarker discovery paradigm to identify candidate biomarkers of cardiovascular and other diseases.
Bibliographical noteFunding Information:
This work was supported by a career development award from the Vanderbilt Faculty Research Scholars Fund (Dr Mosley), the American Heart Association (16FTF30130005) (Dr Mosley), and the Pharmacogenomics Research Network/ National Institutes of Health (NIH; P50-GM115305), and NIH R01LM010685 (Dr Denny). Proteomics analyses were supported by NIH R01HL133870-01A1 and NIH R01HL132320-01 (Drs Gerszten, Wang, and Vasan) and 5T32HL007208 and the John S. LaDue Memorial Fellowship in Cardiology at Harvard Medical School (Dr Benson). BioVU is supported by institutional funding, the 1S10RR025141-01 instrumentation award, and Clinical and Translational Science Award grant UL1TR000445 from the National Center for Advancing Translational Sciences/NIH, and analytical support is provided through P30CA068485 and P30EY08126. VANTAGE and VANGARD core facilities are supported, in part, by the Vanderbilt-Ingram Cancer Center and Vanderbilt Vision Center. The eMERGE Network was initiated and funded by National Human Genome Research Institute/NIH through the following grants: U01HG006828 (Cincinnati Children’s Hospital Medical Center/Boston Children’s Hospital); U01HG006830 (Children’s Hospital of Philadelphia); U01HG006389 (Essentia Institute of Rural Health, Marshfield Clinic Research Foundation and Pennsylvania State University); U01HG006382 (Geisinger Clinic); U01HG008657 (Kaiser Permanente/ University of Washington); U01HG006379 (Mayo Clinic); U01HG006380 (Icahn School of Medicine at Mount Sinai); U01HG006388 (Northwestern University); U01HG006378 (Vanderbilt University Medical Center); U01HG006385 (Vanderbilt University Medical Center serving as the Coordinating Center); and U01HG8685 (Brigham and Women’s Hospital).
- electronic health records