The modern statistical literature is replete with methods for performing variable selection and prediction in standard regression problems. However, simple models may misspecify or fail to capture important aspects of the data generating process such as missingness, correlation, and over/underdispersion. In this article we describe EEBoost, a strategy for variable selection and prediction which can be applied in high-dimensional settings where inference for low-dimensional parameters would typically be based on estimating equations. The method is simple, flexible, and easily implemented using existing software. The EEBoost algorithm is obtained as a modification of the standard boosting (or functional gradient descent) technique.We show that EEBoost is closely related to a class of L1 constrained projected likelihood ratio minimizations, and therefore produces similar variable selection paths to penalized methods without the need to apply constrained optimization algorithms. The flexibility of EEBoost is illustrated by applying it to simulated examples with correlated outcomes and timeto-event data with missing covariates. In both cases, EEBoost outperforms variable selection methods which do not account for the relevant data characteristics. Furthermore, it is shown to be substantially faster to compute than competing methods based on penalized estimating equations.We also apply a version of EEBoost based on the Buckley-James estimating equations to data from an HIV treatment trial, where the aim is to identify mutations which confer resistance to antiretroviral medications. Proofs of the main results appear in the Supplemental Materials (available online).
- Model selection
- Projected likelihood