Abstract
Consider $n$ independent and identically distributed $p$ -dimensional Gaussian random vectors with covariance matrix Σ. The problem of estimating Σ when $p$ is much larger than $n$ has received a lot of attention in recent years. Yet, little is known about the information criterion for covariance matrix estimation. How to properly define such a criterion and what are the statistical properties? We attempt to answer these questions in this paper by focusing on the estimation of bandable covariance matrices when p>n but log (p)=o(n). Motivated by the deep connection between Stein's unbiased risk estimation (SURE) and Akaike information criterion (AIC) in regression models, we propose a family of generalized SURE (SUREc) indexed by c for covariance matrix estimation, where c is some constant. When c is 2, SURE2 provides an unbiased estimator of the Frobenius risk of the covariance matrix estimator. Furthermore, we show that by minimizing SURE2 over all possible banding covariance matrix estimators, we attain the minimax optimal rate of convergence under the Frobenius norm, and the resulting estimator behaves like the covariance matrix estimator obtained by the so-called oracle tuning. When the true covariance matrix is exactly banded, we prove that by minimizing SURElog(n) , we select the true bandwidth with probability tending to one. Therefore, our analysis indicates that SURE2 and SURElog(n) can be regarded as the AIC and Bayesian information criterion for large covariance matrix estimation, respectively.
Original language | English (US) |
---|---|
Article number | 7407414 |
Pages (from-to) | 2153-2169 |
Number of pages | 17 |
Journal | IEEE Transactions on Information Theory |
Volume | 62 |
Issue number | 4 |
DOIs | |
State | Published - Apr 2016 |
Bibliographical note
Funding Information:H. Zou was supported by the National Science Foundation under Grant DMS 1505111.
Publisher Copyright:
© 1963-2012 IEEE.
Keywords
- Covariance matrix
- High-dimensional asymptotics
- Information criteria
- Risk optimality
- Selection consistency