TY - JOUR
T1 - Semiparametric estimation with data missing not at random using an instrumental variable
AU - Sun, Bao Luo
AU - Liu, Lan
AU - Miao, Wang
AU - Wirth, Kathleen
AU - Robins, James
AU - Tchetgen, Eric J.
PY - 2018/10
Y1 - 2018/10
N2 - Missing data occur frequently in empirical studies in the health and social sciences, and can compromise our ability to obtain valid inference. An outcome is said to be missing not at random (MNAR) if, conditional on the observed variables, the missing data mechanism still depends on the unobserved outcome. In such settings, identification is generally not possible without imposing additional assumptions. Identification is sometimes possible, however, if an instrumental variable (IV) is observed for all subjects that satisfies the exclusion restriction that the IV affects the missingness process without directly influencing the outcome. In this paper, we provide necessary and sufficient conditions for nonparametric identification of the full data distribution under MNAR with the aid of an IV. In addition, we give sufficient identification conditions that are more straightforward to verify in practice. For inference, we focus on estimation of a population outcome mean, for which we develop a suite of semiparametric estimators that extend methods previously developed for data missing at random. Specifically, we propose a novel doubly robust estimator of the mean of an outcome subject to MNAR. For illustration, the methods are used to account for selection bias induced by HIV testing refusal in the evaluation of HIV seroprevalence in Mochudi, Botswana, using interviewer characteristics such as gender, age and years of experience as IVs.
AB - Missing data occur frequently in empirical studies in the health and social sciences, and can compromise our ability to obtain valid inference. An outcome is said to be missing not at random (MNAR) if, conditional on the observed variables, the missing data mechanism still depends on the unobserved outcome. In such settings, identification is generally not possible without imposing additional assumptions. Identification is sometimes possible, however, if an instrumental variable (IV) is observed for all subjects that satisfies the exclusion restriction that the IV affects the missingness process without directly influencing the outcome. In this paper, we provide necessary and sufficient conditions for nonparametric identification of the full data distribution under MNAR with the aid of an IV. In addition, we give sufficient identification conditions that are more straightforward to verify in practice. For inference, we focus on estimation of a population outcome mean, for which we develop a suite of semiparametric estimators that extend methods previously developed for data missing at random. Specifically, we propose a novel doubly robust estimator of the mean of an outcome subject to MNAR. For illustration, the methods are used to account for selection bias induced by HIV testing refusal in the evaluation of HIV seroprevalence in Mochudi, Botswana, using interviewer characteristics such as gender, age and years of experience as IVs.
KW - Doubly robust
KW - Instrumental variable
KW - Inverse probability weighting
KW - Missing not at random
UR - http://www.scopus.com/inward/record.url?scp=85053915352&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85053915352&partnerID=8YFLogxK
U2 - 10.5705/ss.202016.0324
DO - 10.5705/ss.202016.0324
M3 - Article
SN - 1017-0405
VL - 28
SP - 1965
EP - 1983
JO - Statistica Sinica
JF - Statistica Sinica
IS - 4
ER -