TY - JOUR
T1 - GEE analysis of clustered binary data with diverging number of covariates
AU - Wang, Lan
PY - 2011/2
Y1 - 2011/2
N2 - Clustered binary data with a large number of covariates have become increasingly common in many scientific disciplines. This paper develops an asymptotic theory for generalized estimating equations (GEE) analysis of clustered binary data when the number of covariates grows to infinity with the number of clusters. In this "large n, diverging p" framework, we provide appropriate regularity conditions and establish the existence, consistency and asymptotic normality of the GEE estimator. Furthermore, we prove that the sandwich variance formula remains valid. Even when the working correlation matrix is misspecified, the use of the sandwich variance formula leads to an asymptotically valid confidence interval and Wald test for an estimable linear combination of the unknown parameters. The accuracy of the asymptotic approximation is examined via numerical simulations. We also discuss the "diverging p" asymptotic theory for general GEE. The results in this paper extend the recent elegant work of Xie and Yang [Ann. Statist. 31 (2003) 310- 347] and Balan and Schiopu-Kratina [Ann. Statist. 32 (2005) 522-541] in the "fixed p" setting.
AB - Clustered binary data with a large number of covariates have become increasingly common in many scientific disciplines. This paper develops an asymptotic theory for generalized estimating equations (GEE) analysis of clustered binary data when the number of covariates grows to infinity with the number of clusters. In this "large n, diverging p" framework, we provide appropriate regularity conditions and establish the existence, consistency and asymptotic normality of the GEE estimator. Furthermore, we prove that the sandwich variance formula remains valid. Even when the working correlation matrix is misspecified, the use of the sandwich variance formula leads to an asymptotically valid confidence interval and Wald test for an estimable linear combination of the unknown parameters. The accuracy of the asymptotic approximation is examined via numerical simulations. We also discuss the "diverging p" asymptotic theory for general GEE. The results in this paper extend the recent elegant work of Xie and Yang [Ann. Statist. 31 (2003) 310- 347] and Balan and Schiopu-Kratina [Ann. Statist. 32 (2005) 522-541] in the "fixed p" setting.
KW - Clustered binary data
KW - Generalized estimating equations (GEE)
KW - Highdimensional covariates
KW - Sandwich variance formula
UR - http://www.scopus.com/inward/record.url?scp=79551587551&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79551587551&partnerID=8YFLogxK
U2 - 10.1214/10-AOS846
DO - 10.1214/10-AOS846
M3 - Article
AN - SCOPUS:79551587551
SN - 0090-5364
VL - 39
SP - 389
EP - 417
JO - Annals of Statistics
JF - Annals of Statistics
IS - 1
ER -