Comparing large covariance matrices under weak conditions on the dependence structure and its application to gene clustering

Jinyuan Chang, Wen Zhou, Wen Xin Zhou, Lan Wang

Research output: Contribution to journalArticlepeer-review

34 Scopus citations

Abstract

Comparing large covariance matrices has important applications in modern genomics, where scientists are often interested in understanding whether relationships (e.g., dependencies or co-regulations) among a large number of genes vary between different biological states. We propose a computationally fast procedure for testing the equality of two large covariance matrices when the dimensions of the covariance matrices are much larger than the sample sizes. A distinguishing feature of the new procedure is that it imposes no structural assumptions on the unknown covariance matrices. Hence, the test is robust with respect to various complex dependence structures that frequently arise in genomics. We prove that the proposed procedure is asymptotically valid under weak moment conditions. As an interesting application, we derive a new gene clustering algorithm which shares the same nice property of avoiding restrictive structural assumptions for high-dimensional genomics data. Using an asthma gene expression dataset, we illustrate how the new test helps compare the covariance matrices of the genes across different gene sets/pathways between the disease group and the control group, and how the gene clustering algorithm provides new insights on the way gene clustering patterns differ between the two groups. The proposed methods have been implemented in an R-package HDtest and are available on CRAN.

Original languageEnglish (US)
Pages (from-to)31-41
Number of pages11
JournalBiometrics
Volume73
Issue number1
DOIs
StatePublished - Mar 1 2017

Bibliographical note

Funding Information:
The authors thank the AE and two anonymous referees for constructive comments and suggestions which have improved the presentation of the article. Jinyuan Chang was supported in part by the Fundamental Research Funds for the Central Universities (Grant Nos. JBK160159, JBK150501, JBK140507, JBK120509), NSFC (Grant No. 11501462), the Center of Statistical Research at SWUFE, and the Australian Research Council. Wen Zhou was supported in part by NSF Grant IIS-1545994. Lan Wang was supported in part by NSF Grant NSF DMS-1512267.

Publisher Copyright:
© 2016, The International Biometric Society

Keywords

  • Differential expression analysis
  • Gene clustering
  • High dimension
  • Hypothesis testing
  • Parametric bootstrap
  • Sparsity

Fingerprint

Dive into the research topics of 'Comparing large covariance matrices under weak conditions on the dependence structure and its application to gene clustering'. Together they form a unique fingerprint.

Cite this