High-dimensional variable selection with right-censored length-biased data

Di He, Yong Zhou, Hui Zou

Research output: Contribution to journalArticlepeer-review


Length-biased data are common in various fields, including epidemiology and labor economics, and they have attracted considerable attention in survival literature. A crucial goal of a survival analysis is to identify a subset of risk factors and their risk contributions from among a vast number of clinical covariates. However, there is no research on variable selection for length-biased data, owing to the complex nature of such data and the lack of a convenient loss function. Therefore, we propose an estimation method based on penalized estimating equations to obtain a sparse and consistent estimator for length-biased data under an accelerated failure time model. The proposed estimator possesses the selection and estimation consistency property. In particular, we implement our method using a SCAD penalty and a local linear approximation algorithm. We suggest selecting the tuning parameter using the extended BIC in high-dimensional settings. Furthermore, we develop a novel multistage SCAD penalized estimating equation procedure to achieve improved estimation accuracy and sparsity in the variable selection. Simulation studies show that the proposed procedure has high accuracy and almost perfect sparsity. Oscar Awards data are analyzed as an application of the proposed method.

Original languageEnglish (US)
Pages (from-to)193-215
Number of pages23
JournalStatistica Sinica
Issue number1
StatePublished - Jan 2020

Bibliographical note

Funding Information:
We thank the editor, associate editor, and referees for their helpful comments and suggestions. Zou’s work was supported, in part, by NSF grant DMS-1505111. Zhou’s work was supported by the State Key Program in the Major Research Plan of National Natural Science Foundation of China (91546202) and the Key Laboratory of Advanced Theory and Application in Statistics and Data Science, MOE.

Publisher Copyright:
© 2020 Institute of Statistical Science. All rights reserved.


  • Accelerated failure time model
  • High-dimensional variable selection
  • Length-biased data
  • Multi-stage penalization


Dive into the research topics of 'High-dimensional variable selection with right-censored length-biased data'. Together they form a unique fingerprint.

Cite this