A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data

Julian Wolfson; Sunayan Bandyopadhyay; Mohamed Elidrisi; Gabriela Vazquez-Benitez; David M. Vock; Donald Musgrove; Gediminas Adomavicius; Paul E. Johnson; Patrick J. O'Connor

doi:10.1002/sim.6526

A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data

Julian Wolfson, Sunayan Bandyopadhyay, Mohamed Elidrisi, Gabriela Vazquez-Benitez, David M. Vock, Donald Musgrove, Gediminas Adomavicius, Paul E. Johnson, Patrick J. O'Connor

Research output: Contribution to journal › Article › peer-review

35 Scopus citations

Abstract

Predicting an individual's risk of experiencing a future clinical outcome is a statistical task with important consequences for both practicing clinicians and public health experts. Modern observational databases such as electronic health records provide an alternative to the longitudinal cohort studies traditionally used to construct risk models, bringing with them both opportunities and challenges. Large sample sizes and detailed covariate histories enable the use of sophisticated machine learning techniques to uncover complex associations and interactions, but observational databases are often 'messy', with high levels of missing data and incomplete patient follow-up. In this paper, we propose an adaptation of the well-known Naive Bayes machine learning approach to time-to-event outcomes subject to censoring. We compare the predictive performance of our method with the Cox proportional hazards model which is commonly used for risk prediction in healthcare populations, and illustrate its application to prediction of cardiovascular risk using an electronic health record dataset from a large Midwest integrated healthcare system.

Original language	English (US)
Pages (from-to)	2941-2957
Number of pages	17
Journal	Statistics in Medicine
Volume	34
Issue number	21
DOIs	https://doi.org/10.1002/sim.6526
State	Published - Sep 20 2015

Bibliographical note

Publisher Copyright:
© 2015 John Wiley & Sons, Ltd.

Keywords

Electronic health records
Machine learning
Naive Bayes
Risk prediction
Survival analysis

Access

10.1002/sim.6526

OpenUrl availability

Full text

Cite this

@article{b6f98f989d214f27a23bbb7d64100d7f,

title = "A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data",

abstract = "Predicting an individual's risk of experiencing a future clinical outcome is a statistical task with important consequences for both practicing clinicians and public health experts. Modern observational databases such as electronic health records provide an alternative to the longitudinal cohort studies traditionally used to construct risk models, bringing with them both opportunities and challenges. Large sample sizes and detailed covariate histories enable the use of sophisticated machine learning techniques to uncover complex associations and interactions, but observational databases are often 'messy', with high levels of missing data and incomplete patient follow-up. In this paper, we propose an adaptation of the well-known Naive Bayes machine learning approach to time-to-event outcomes subject to censoring. We compare the predictive performance of our method with the Cox proportional hazards model which is commonly used for risk prediction in healthcare populations, and illustrate its application to prediction of cardiovascular risk using an electronic health record dataset from a large Midwest integrated healthcare system.",

keywords = "Electronic health records, Machine learning, Naive Bayes, Risk prediction, Survival analysis",

author = "Julian Wolfson and Sunayan Bandyopadhyay and Mohamed Elidrisi and Gabriela Vazquez-Benitez and Vock, {David M.} and Donald Musgrove and Gediminas Adomavicius and Johnson, {Paul E.} and O'Connor, {Patrick J.}",

note = "Publisher Copyright: {\textcopyright} 2015 John Wiley & Sons, Ltd.",

year = "2015",

month = sep,

day = "20",

doi = "10.1002/sim.6526",

language = "English (US)",

volume = "34",

pages = "2941--2957",

journal = "Statistics in Medicine",

issn = "0277-6715",

publisher = "John Wiley and Sons Ltd",

number = "21",

}

TY - JOUR

T1 - A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data

AU - Wolfson, Julian

AU - Bandyopadhyay, Sunayan

AU - Elidrisi, Mohamed

AU - Vazquez-Benitez, Gabriela

AU - Vock, David M.

AU - Musgrove, Donald

AU - Adomavicius, Gediminas

AU - Johnson, Paul E.

AU - O'Connor, Patrick J.

PY - 2015/9/20

Y1 - 2015/9/20

N2 - Predicting an individual's risk of experiencing a future clinical outcome is a statistical task with important consequences for both practicing clinicians and public health experts. Modern observational databases such as electronic health records provide an alternative to the longitudinal cohort studies traditionally used to construct risk models, bringing with them both opportunities and challenges. Large sample sizes and detailed covariate histories enable the use of sophisticated machine learning techniques to uncover complex associations and interactions, but observational databases are often 'messy', with high levels of missing data and incomplete patient follow-up. In this paper, we propose an adaptation of the well-known Naive Bayes machine learning approach to time-to-event outcomes subject to censoring. We compare the predictive performance of our method with the Cox proportional hazards model which is commonly used for risk prediction in healthcare populations, and illustrate its application to prediction of cardiovascular risk using an electronic health record dataset from a large Midwest integrated healthcare system.

AB - Predicting an individual's risk of experiencing a future clinical outcome is a statistical task with important consequences for both practicing clinicians and public health experts. Modern observational databases such as electronic health records provide an alternative to the longitudinal cohort studies traditionally used to construct risk models, bringing with them both opportunities and challenges. Large sample sizes and detailed covariate histories enable the use of sophisticated machine learning techniques to uncover complex associations and interactions, but observational databases are often 'messy', with high levels of missing data and incomplete patient follow-up. In this paper, we propose an adaptation of the well-known Naive Bayes machine learning approach to time-to-event outcomes subject to censoring. We compare the predictive performance of our method with the Cox proportional hazards model which is commonly used for risk prediction in healthcare populations, and illustrate its application to prediction of cardiovascular risk using an electronic health record dataset from a large Midwest integrated healthcare system.

KW - Electronic health records

KW - Machine learning

KW - Naive Bayes

KW - Risk prediction

KW - Survival analysis

UR - http://www.scopus.com/inward/record.url?scp=84938292482&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84938292482&partnerID=8YFLogxK

U2 - 10.1002/sim.6526

DO - 10.1002/sim.6526

M3 - Article

C2 - 25980520

AN - SCOPUS:84938292482

SN - 0277-6715

VL - 34

SP - 2941

EP - 2957

JO - Statistics in Medicine

JF - Statistics in Medicine

IS - 21

ER -

A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this