MEBoost: Variable selection in the presence of measurement error

Ben Brown; Timothy Weaver; Julian Wolfson

doi:10.1002/sim.8130

MEBoost: Variable selection in the presence of measurement error

Ben Brown, Timothy Weaver, Julian Wolfson

Biostatistics

Research output: Contribution to journal › Article › peer-review

16 Scopus citations

Abstract

We present a novel method for variable selection in regression models when covariates are measured with error. The iterative algorithm we propose, Measurement Error Boosting (MEBoost), follows a path defined by estimating equations that correct for covariate measurement error. We illustrate the use of MEBoost in practice by analyzing data from the Box Lunch Study, a clinical trial in nutrition where several variables are based on self-report and, hence, measured with error, where we are interested in performing model selection from a large data set to select variables that are related to the number of times a subject binge ate in the last 28 days. Furthermore, we evaluated our method and compared its performance to the recently proposed Convex Conditioned Lasso and to the “naive” Lasso, which does not correct for measurement error through a simulation study. Increasing the degree of measurement error increased prediction error and decreased the probability of accurate covariate selection, but this loss of accuracy occurred to a lesser degree when using MEBoost. Through simulations, we also make a case for the consistency of the model selected.

Original language	English (US)
Pages (from-to)	2705-2718
Number of pages	14
Journal	Statistics in Medicine
Volume	38
Issue number	15
DOIs	https://doi.org/10.1002/sim.8130
State	Published - Jul 10 2019

Bibliographical note

Publisher Copyright:
© 2019 John Wiley & Sons, Ltd.

Keywords

boosting
high-dimensional data
machine learning
measurement error
variable selection

Access

10.1002/sim.8130

OpenUrl availability

Full text

Cite this

@article{bf8c63e76e4a4ae1bcfaaf7f89e362aa,

title = "MEBoost: Variable selection in the presence of measurement error",

abstract = "We present a novel method for variable selection in regression models when covariates are measured with error. The iterative algorithm we propose, Measurement Error Boosting (MEBoost), follows a path defined by estimating equations that correct for covariate measurement error. We illustrate the use of MEBoost in practice by analyzing data from the Box Lunch Study, a clinical trial in nutrition where several variables are based on self-report and, hence, measured with error, where we are interested in performing model selection from a large data set to select variables that are related to the number of times a subject binge ate in the last 28 days. Furthermore, we evaluated our method and compared its performance to the recently proposed Convex Conditioned Lasso and to the “naive” Lasso, which does not correct for measurement error through a simulation study. Increasing the degree of measurement error increased prediction error and decreased the probability of accurate covariate selection, but this loss of accuracy occurred to a lesser degree when using MEBoost. Through simulations, we also make a case for the consistency of the model selected.",

keywords = "boosting, high-dimensional data, machine learning, measurement error, variable selection",

author = "Ben Brown and Timothy Weaver and Julian Wolfson",

note = "Publisher Copyright: {\textcopyright} 2019 John Wiley & Sons, Ltd.",

year = "2019",

month = jul,

day = "10",

doi = "10.1002/sim.8130",

language = "English (US)",

volume = "38",

pages = "2705--2718",

journal = "Statistics in Medicine",

issn = "0277-6715",

publisher = "John Wiley and Sons Ltd",

number = "15",

}

TY - JOUR

T1 - MEBoost

T2 - Variable selection in the presence of measurement error

AU - Brown, Ben

AU - Weaver, Timothy

AU - Wolfson, Julian

PY - 2019/7/10

Y1 - 2019/7/10

N2 - We present a novel method for variable selection in regression models when covariates are measured with error. The iterative algorithm we propose, Measurement Error Boosting (MEBoost), follows a path defined by estimating equations that correct for covariate measurement error. We illustrate the use of MEBoost in practice by analyzing data from the Box Lunch Study, a clinical trial in nutrition where several variables are based on self-report and, hence, measured with error, where we are interested in performing model selection from a large data set to select variables that are related to the number of times a subject binge ate in the last 28 days. Furthermore, we evaluated our method and compared its performance to the recently proposed Convex Conditioned Lasso and to the “naive” Lasso, which does not correct for measurement error through a simulation study. Increasing the degree of measurement error increased prediction error and decreased the probability of accurate covariate selection, but this loss of accuracy occurred to a lesser degree when using MEBoost. Through simulations, we also make a case for the consistency of the model selected.

AB - We present a novel method for variable selection in regression models when covariates are measured with error. The iterative algorithm we propose, Measurement Error Boosting (MEBoost), follows a path defined by estimating equations that correct for covariate measurement error. We illustrate the use of MEBoost in practice by analyzing data from the Box Lunch Study, a clinical trial in nutrition where several variables are based on self-report and, hence, measured with error, where we are interested in performing model selection from a large data set to select variables that are related to the number of times a subject binge ate in the last 28 days. Furthermore, we evaluated our method and compared its performance to the recently proposed Convex Conditioned Lasso and to the “naive” Lasso, which does not correct for measurement error through a simulation study. Increasing the degree of measurement error increased prediction error and decreased the probability of accurate covariate selection, but this loss of accuracy occurred to a lesser degree when using MEBoost. Through simulations, we also make a case for the consistency of the model selected.

KW - boosting

KW - high-dimensional data

KW - machine learning

KW - measurement error

KW - variable selection

UR - http://www.scopus.com/inward/record.url?scp=85062785118&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062785118&partnerID=8YFLogxK

U2 - 10.1002/sim.8130

DO - 10.1002/sim.8130

M3 - Article

C2 - 30856279

AN - SCOPUS:85062785118

SN - 0277-6715

VL - 38

SP - 2705

EP - 2718

JO - Statistics in Medicine

JF - Statistics in Medicine

IS - 15

ER -

MEBoost: Variable selection in the presence of measurement error

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this