Multiple imputation using dimension reduction techniques for high-dimensional data

Domonique W. Hodge; Sandra E. Safo; Qi Long

Multiple imputation using dimension reduction techniques for high-dimensional data

Domonique W. Hodge, Sandra E. Safo, Qi Long

Biostatistics

Research output: Contribution to journal › Article

2 Downloads (Pure)

Abstract

Missing data present challenges in data analysis. Naive analyses such as complete-case and available-case analysis may introduce bias and loss of efficiency, and produce unreliable results. Multiple imputation (MI) is one of the most widely used methods for handling missing data which can be partly attributed to its ease of use. However, existing MI methods implemented in most statistical software are not applicable to or do not perform well in high-dimensional settings where the number of predictors is large relative to the sample size. To remedy this issue, we develop an MI approach that uses dimension reduction techniques. Specifically, in constructing imputation models in the presence of high-dimensional data our approach uses sure independent screening followed by either sparse principal component analysis (sPCA) or sufficient dimension reduction (SDR) techniques. Our simulation studies, conducted for high-dimensional data, demonstrate that using SIS followed by sPCA to perform MI achieves better performance than the other imputation methods including several existing imputation approaches. We apply our approach to analysis of gene expression data from a prostate cancer study.

Original language	English (US)
Journal	arXiv
State	Published - May 13 2019

Keywords

stat.ME

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access

1905.05274v1

OpenUrl availability

Full text

Cite this

@article{bafdb44a53e84d0c9c0b7f4822616792,

title = "Multiple imputation using dimension reduction techniques for high-dimensional data",

abstract = " Missing data present challenges in data analysis. Naive analyses such as complete-case and available-case analysis may introduce bias and loss of efficiency, and produce unreliable results. Multiple imputation (MI) is one of the most widely used methods for handling missing data which can be partly attributed to its ease of use. However, existing MI methods implemented in most statistical software are not applicable to or do not perform well in high-dimensional settings where the number of predictors is large relative to the sample size. To remedy this issue, we develop an MI approach that uses dimension reduction techniques. Specifically, in constructing imputation models in the presence of high-dimensional data our approach uses sure independent screening followed by either sparse principal component analysis (sPCA) or sufficient dimension reduction (SDR) techniques. Our simulation studies, conducted for high-dimensional data, demonstrate that using SIS followed by sPCA to perform MI achieves better performance than the other imputation methods including several existing imputation approaches. We apply our approach to analysis of gene expression data from a prostate cancer study. ",

keywords = "stat.ME",

author = "Hodge, {Domonique W.} and Safo, {Sandra E.} and Qi Long",

year = "2019",

month = may,

day = "13",

language = "English (US)",

journal = "arXiv",

}

TY - JOUR

T1 - Multiple imputation using dimension reduction techniques for high-dimensional data

AU - Hodge, Domonique W.

AU - Safo, Sandra E.

AU - Long, Qi

PY - 2019/5/13

Y1 - 2019/5/13

N2 - Missing data present challenges in data analysis. Naive analyses such as complete-case and available-case analysis may introduce bias and loss of efficiency, and produce unreliable results. Multiple imputation (MI) is one of the most widely used methods for handling missing data which can be partly attributed to its ease of use. However, existing MI methods implemented in most statistical software are not applicable to or do not perform well in high-dimensional settings where the number of predictors is large relative to the sample size. To remedy this issue, we develop an MI approach that uses dimension reduction techniques. Specifically, in constructing imputation models in the presence of high-dimensional data our approach uses sure independent screening followed by either sparse principal component analysis (sPCA) or sufficient dimension reduction (SDR) techniques. Our simulation studies, conducted for high-dimensional data, demonstrate that using SIS followed by sPCA to perform MI achieves better performance than the other imputation methods including several existing imputation approaches. We apply our approach to analysis of gene expression data from a prostate cancer study.

AB - Missing data present challenges in data analysis. Naive analyses such as complete-case and available-case analysis may introduce bias and loss of efficiency, and produce unreliable results. Multiple imputation (MI) is one of the most widely used methods for handling missing data which can be partly attributed to its ease of use. However, existing MI methods implemented in most statistical software are not applicable to or do not perform well in high-dimensional settings where the number of predictors is large relative to the sample size. To remedy this issue, we develop an MI approach that uses dimension reduction techniques. Specifically, in constructing imputation models in the presence of high-dimensional data our approach uses sure independent screening followed by either sparse principal component analysis (sPCA) or sufficient dimension reduction (SDR) techniques. Our simulation studies, conducted for high-dimensional data, demonstrate that using SIS followed by sPCA to perform MI achieves better performance than the other imputation methods including several existing imputation approaches. We apply our approach to analysis of gene expression data from a prostate cancer study.

KW - stat.ME

M3 - Article

JO - arXiv

JF - arXiv

ER -

Multiple imputation using dimension reduction techniques for high-dimensional data

Abstract

Keywords

UN SDGs

Access

OpenUrl availability

Fingerprint

Cite this