A data-driven approach to conditional screening of high-dimensional variables

Hyokyoung G. Hong, Lan Wang, Xuming He

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Marginal screening is a widely applied technique to handily reduce the dimensionality of the data when the number of potential features overwhelms the sample size. Because of the nature of the marginal screening procedures, they are also known for their difficulty in identifying the so-called hidden variables that are jointly important but have weak marginal associations with the response variable. Failing to include a hidden variable in the screening stage has two undesirable consequences: (1) important features are missed out in model selection, and (2) biased inference is likely to occur in the subsequent analysis. Motivated by some recent work in conditional screening, we propose a data-driven conditional screening algorithm, which is computationally efficient, enjoys the sure screening property under weaker assumptions on the model and works robustly in a variety of settings to reduce false negatives of hidden variables. Numerical comparison with alternatives screening procedures is also made to shed light on the relative merit of the proposed method. We illustrate the proposed methodology using a leukaemia microarray data example.

Original languageEnglish (US)
Pages (from-to)200-212
Number of pages13
JournalStat
Volume5
Issue number1
DOIs
StatePublished - 2016

Bibliographical note

Funding Information:
We would like to thank Dr Emre Barut and Dr Vincent Vu for helpful discussions, Dr Zongming Ma for sharing his codes for sparse principal component analysis and Dr Chenlei Leng and Dr Yiyuan She for sharing their unpublished papers. H.G.H. is supported by NSA grant H98230-15-1-0260. Lan Wang is supported by NSF grant DMS-1512267. Xuming He is supported by NSF grant DMS-1307566.

Keywords

  • conditional screening
  • false negative
  • feature screening
  • high dimension
  • sparse principal component analysis
  • sure screening property

Fingerprint Dive into the research topics of 'A data-driven approach to conditional screening of high-dimensional variables'. Together they form a unique fingerprint.

Cite this