High-dimensional variable selection with right-censored length-biased data

Di He; Yong Zhou; Hui Zou

doi:10.5705/SS.202018.0089

High-dimensional variable selection with right-censored length-biased data

Di He, Yong Zhou, Hui Zou

Statistics (Twin Cities)

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Length-biased data are common in various fields, including epidemiology and labor economics, and they have attracted considerable attention in survival literature. A crucial goal of a survival analysis is to identify a subset of risk factors and their risk contributions from among a vast number of clinical covariates. However, there is no research on variable selection for length-biased data, owing to the complex nature of such data and the lack of a convenient loss function. Therefore, we propose an estimation method based on penalized estimating equations to obtain a sparse and consistent estimator for length-biased data under an accelerated failure time model. The proposed estimator possesses the selection and estimation consistency property. In particular, we implement our method using a SCAD penalty and a local linear approximation algorithm. We suggest selecting the tuning parameter using the extended BIC in high-dimensional settings. Furthermore, we develop a novel multistage SCAD penalized estimating equation procedure to achieve improved estimation accuracy and sparsity in the variable selection. Simulation studies show that the proposed procedure has high accuracy and almost perfect sparsity. Oscar Awards data are analyzed as an application of the proposed method.

Original language	English (US)
Pages (from-to)	193-215
Number of pages	23
Journal	Statistica Sinica
Volume	30
Issue number	1
DOIs	https://doi.org/10.5705/SS.202018.0089
State	Published - Jan 2020

Bibliographical note

Funding Information:
We thank the editor, associate editor, and referees for their helpful comments and suggestions. Zou’s work was supported, in part, by NSF grant DMS-1505111. Zhou’s work was supported by the State Key Program in the Major Research Plan of National Natural Science Foundation of China (91546202) and the Key Laboratory of Advanced Theory and Application in Statistics and Data Science, MOE.

Publisher Copyright:
© 2020 Institute of Statistical Science. All rights reserved.

Keywords

Accelerated failure time model
High-dimensional variable selection
Length-biased data
Multi-stage penalization

Access

10.5705/SS.202018.0089

OpenUrl availability

Full text

Cite this

@article{6c028b89e59d4240a3ac2048d8f5db5b,

title = "High-dimensional variable selection with right-censored length-biased data",

abstract = "Length-biased data are common in various fields, including epidemiology and labor economics, and they have attracted considerable attention in survival literature. A crucial goal of a survival analysis is to identify a subset of risk factors and their risk contributions from among a vast number of clinical covariates. However, there is no research on variable selection for length-biased data, owing to the complex nature of such data and the lack of a convenient loss function. Therefore, we propose an estimation method based on penalized estimating equations to obtain a sparse and consistent estimator for length-biased data under an accelerated failure time model. The proposed estimator possesses the selection and estimation consistency property. In particular, we implement our method using a SCAD penalty and a local linear approximation algorithm. We suggest selecting the tuning parameter using the extended BIC in high-dimensional settings. Furthermore, we develop a novel multistage SCAD penalized estimating equation procedure to achieve improved estimation accuracy and sparsity in the variable selection. Simulation studies show that the proposed procedure has high accuracy and almost perfect sparsity. Oscar Awards data are analyzed as an application of the proposed method.",

keywords = "Accelerated failure time model, High-dimensional variable selection, Length-biased data, Multi-stage penalization",

author = "Di He and Yong Zhou and Hui Zou",

note = "Funding Information: We thank the editor, associate editor, and referees for their helpful comments and suggestions. Zou{\textquoteright}s work was supported, in part, by NSF grant DMS-1505111. Zhou{\textquoteright}s work was supported by the State Key Program in the Major Research Plan of National Natural Science Foundation of China (91546202) and the Key Laboratory of Advanced Theory and Application in Statistics and Data Science, MOE. Publisher Copyright: {\textcopyright} 2020 Institute of Statistical Science. All rights reserved.",

year = "2020",

month = jan,

doi = "10.5705/SS.202018.0089",

language = "English (US)",

volume = "30",

pages = "193--215",

journal = "Statistica Sinica",

issn = "1017-0405",

publisher = "Institute of Statistical Science",

number = "1",

}

TY - JOUR

T1 - High-dimensional variable selection with right-censored length-biased data

AU - He, Di

AU - Zhou, Yong

AU - Zou, Hui

N1 - Funding Information: We thank the editor, associate editor, and referees for their helpful comments and suggestions. Zou’s work was supported, in part, by NSF grant DMS-1505111. Zhou’s work was supported by the State Key Program in the Major Research Plan of National Natural Science Foundation of China (91546202) and the Key Laboratory of Advanced Theory and Application in Statistics and Data Science, MOE. Publisher Copyright: © 2020 Institute of Statistical Science. All rights reserved.

PY - 2020/1

Y1 - 2020/1

N2 - Length-biased data are common in various fields, including epidemiology and labor economics, and they have attracted considerable attention in survival literature. A crucial goal of a survival analysis is to identify a subset of risk factors and their risk contributions from among a vast number of clinical covariates. However, there is no research on variable selection for length-biased data, owing to the complex nature of such data and the lack of a convenient loss function. Therefore, we propose an estimation method based on penalized estimating equations to obtain a sparse and consistent estimator for length-biased data under an accelerated failure time model. The proposed estimator possesses the selection and estimation consistency property. In particular, we implement our method using a SCAD penalty and a local linear approximation algorithm. We suggest selecting the tuning parameter using the extended BIC in high-dimensional settings. Furthermore, we develop a novel multistage SCAD penalized estimating equation procedure to achieve improved estimation accuracy and sparsity in the variable selection. Simulation studies show that the proposed procedure has high accuracy and almost perfect sparsity. Oscar Awards data are analyzed as an application of the proposed method.

AB - Length-biased data are common in various fields, including epidemiology and labor economics, and they have attracted considerable attention in survival literature. A crucial goal of a survival analysis is to identify a subset of risk factors and their risk contributions from among a vast number of clinical covariates. However, there is no research on variable selection for length-biased data, owing to the complex nature of such data and the lack of a convenient loss function. Therefore, we propose an estimation method based on penalized estimating equations to obtain a sparse and consistent estimator for length-biased data under an accelerated failure time model. The proposed estimator possesses the selection and estimation consistency property. In particular, we implement our method using a SCAD penalty and a local linear approximation algorithm. We suggest selecting the tuning parameter using the extended BIC in high-dimensional settings. Furthermore, we develop a novel multistage SCAD penalized estimating equation procedure to achieve improved estimation accuracy and sparsity in the variable selection. Simulation studies show that the proposed procedure has high accuracy and almost perfect sparsity. Oscar Awards data are analyzed as an application of the proposed method.

KW - Accelerated failure time model

KW - High-dimensional variable selection

KW - Length-biased data

KW - Multi-stage penalization

UR - http://www.scopus.com/inward/record.url?scp=85084745936&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85084745936&partnerID=8YFLogxK

U2 - 10.5705/SS.202018.0089

DO - 10.5705/SS.202018.0089

M3 - Article

AN - SCOPUS:85084745936

SN - 1017-0405

VL - 30

SP - 193

EP - 215

JO - Statistica Sinica

JF - Statistica Sinica

IS - 1

ER -

High-dimensional variable selection with right-censored length-biased data

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this