TY - JOUR
T1 - Eeboost
T2 - A general method for prediction and variable selection based on estimating equations
AU - Wolfson, Julian
PY - 2011/3
Y1 - 2011/3
N2 - The modern statistical literature is replete with methods for performing variable selection and prediction in standard regression problems. However, simple models may misspecify or fail to capture important aspects of the data generating process such as missingness, correlation, and over/underdispersion. In this article we describe EEBoost, a strategy for variable selection and prediction which can be applied in high-dimensional settings where inference for low-dimensional parameters would typically be based on estimating equations. The method is simple, flexible, and easily implemented using existing software. The EEBoost algorithm is obtained as a modification of the standard boosting (or functional gradient descent) technique.We show that EEBoost is closely related to a class of L1 constrained projected likelihood ratio minimizations, and therefore produces similar variable selection paths to penalized methods without the need to apply constrained optimization algorithms. The flexibility of EEBoost is illustrated by applying it to simulated examples with correlated outcomes and timeto-event data with missing covariates. In both cases, EEBoost outperforms variable selection methods which do not account for the relevant data characteristics. Furthermore, it is shown to be substantially faster to compute than competing methods based on penalized estimating equations.We also apply a version of EEBoost based on the Buckley-James estimating equations to data from an HIV treatment trial, where the aim is to identify mutations which confer resistance to antiretroviral medications. Proofs of the main results appear in the Supplemental Materials (available online).
AB - The modern statistical literature is replete with methods for performing variable selection and prediction in standard regression problems. However, simple models may misspecify or fail to capture important aspects of the data generating process such as missingness, correlation, and over/underdispersion. In this article we describe EEBoost, a strategy for variable selection and prediction which can be applied in high-dimensional settings where inference for low-dimensional parameters would typically be based on estimating equations. The method is simple, flexible, and easily implemented using existing software. The EEBoost algorithm is obtained as a modification of the standard boosting (or functional gradient descent) technique.We show that EEBoost is closely related to a class of L1 constrained projected likelihood ratio minimizations, and therefore produces similar variable selection paths to penalized methods without the need to apply constrained optimization algorithms. The flexibility of EEBoost is illustrated by applying it to simulated examples with correlated outcomes and timeto-event data with missing covariates. In both cases, EEBoost outperforms variable selection methods which do not account for the relevant data characteristics. Furthermore, it is shown to be substantially faster to compute than competing methods based on penalized estimating equations.We also apply a version of EEBoost based on the Buckley-James estimating equations to data from an HIV treatment trial, where the aim is to identify mutations which confer resistance to antiretroviral medications. Proofs of the main results appear in the Supplemental Materials (available online).
KW - Boosting
KW - Model selection
KW - Prediction
KW - Projected likelihood
UR - http://www.scopus.com/inward/record.url?scp=79954455620&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79954455620&partnerID=8YFLogxK
U2 - 10.1198/jasa.2011.tm10098
DO - 10.1198/jasa.2011.tm10098
M3 - Article
AN - SCOPUS:79954455620
SN - 0162-1459
VL - 106
SP - 296
EP - 305
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 493
ER -