A multilevel bayesian approach to improve effect size estimation in regression modeling of metabolomics data utilizing imputation with uncertainty

Christopher E. Gillies, Theodore S. Jennaro, Michael A. Puskarich, Ruchi Sharma, Kevin R. Ward, Xudong Fan, Alan E. Jones, Kathleen A. Stringer

Research output: Contribution to journalArticlepeer-review

Abstract

To ensure scientific reproducibility of metabolomics data, alternative statistical methods are needed. A paradigm shift away from the p-value toward an embracement of uncertainty and interval estimation of a metabolite’s true effect size may lead to improved study design and greater reproducibility. Multilevel Bayesian models are one approach that offer the added opportunity of incorporating imputed value uncertainty when missing data are present. We designed simulations of metabolomics data to compare multilevel Bayesian models to standard logistic regression with corrections for multiple hypothesis testing. Our simulations altered the sample size and the fraction of significant metabolites truly different between two outcome groups. We then introduced missingness to further assess model performance. Across simulations, the multilevel Bayesian approach more accurately estimated the effect size of metabolites that were significantly different between groups. Bayesian models also had greater power and mitigated the false discovery rate. In the presence of increased missing data, Bayesian models were able to accurately impute the true concentration and incorporating the uncertainty of these estimates improved overall prediction. In summary, our simulations demonstrate that a multilevel Bayesian approach accurately quantifies the estimated effect size of metabolite predictors in regression modeling, particularly in the presence of missing data.

Original languageEnglish (US)
Article number319
Pages (from-to)1-19
Number of pages19
JournalMetabolites
Volume10
Issue number8
DOIs
StatePublished - Aug 2020

Bibliographical note

Funding Information:
This study was supported by the Michigan Institute for Data Science ?Propelling Original Data Science? grant from the University of Michigan. The generation of the original metabolomics data was supported by the National Institute of General Medical Sciences (NIGMS) via R01GM103799 (AEJ), K23GM113041 (MAP), and R01GM111400 (KAS). The generation of the GC-MS data was supported by the National Heart Lung and Blood Institute (NHLBI) (1-R21-HL-139156-01 to XF), the NIH Center for Accelerated Innovations at Cleveland Clinic (Program Prime Award Number: 1UH54HL119810-05; Project Number: NCAI-17-7-APP-UMICH-Fan), the Michigan Translation and Commercialization (MTRAC) for Life Sciences Hub, and the Michigan Center for Integrative Research in Critical Care. The content is solely the responsibility of the authors and does not necessarily represent the official views of NIGMS, NHLBI, or the NIH.

Funding Information:
Funding: This study was supported by the Michigan Institute for Data Science “Propelling Original Data Science” grant from the University of Michigan. The generation of the original metabolomics data was supported by the National Institute of General Medical Sciences (NIGMS) via R01GM103799 (AEJ), K23GM113041 (MAP), and R01GM111400 (KAS). The generation of the GC-MS data was supported by the National Heart Lung and Blood Institute (NHLBI) (1-R21-HL-139156-01 to XF), the NIH Center for Accelerated Innovations at Cleveland Clinic (Program Prime Award Number: 1UH54HL119810-05; Project Number: NCAI-17-7-APP-UMICH-Fan), the Michigan Translation and Commercialization (MTRAC) for Life Sciences Hub, and the Michigan Center for Integrative Research in Critical Care. The content is solely the responsibility of the authors and does not necessarily represent the official views of NIGMS, NHLBI, or the NIH.

Keywords

  • Bayesian statistics
  • Hierarchical modeling
  • Imputation
  • Missing values
  • Multiple test corrections
  • Nuclear magnetic resonance spectroscopy

PubMed: MeSH publication types

  • Journal Article

Fingerprint Dive into the research topics of 'A multilevel bayesian approach to improve effect size estimation in regression modeling of metabolomics data utilizing imputation with uncertainty'. Together they form a unique fingerprint.

Cite this