SURE-tuned tapering estimation of large covariance matrices

Feng Yi; Hui Zou

doi:10.1016/j.csda.2012.09.007

SURE-tuned tapering estimation of large covariance matrices

Feng Yi, Hui Zou

Statistics (Twin Cities)

Research output: Contribution to journal › Article › peer-review

14 Scopus citations

Abstract

Bandable covariance matrices are often used to model the dependence structure of variables that follow a nature order. It has been shown that the tapering covariance estimator attains the optimal minimax rates of convergence for estimating large bandable covariance matrices. The estimation risk critically depends on the choice of the tapering parameter. We develop a Stein's Unbiased Risk Estimation (SURE) theory for estimating the Frobenius risk of the tapering estimator. SURE tuning selects the minimizer of SURE curve as the chosen tapering parameter. An extensive Monte Carlo study shows that SURE tuning is often comparable to the oracle tuning and outperforms cross-validation. We further illustrate SURE tuning using rock sonar spectrum data. The real data analysis results are consistent with simulation findings.

Original language	English (US)
Pages (from-to)	339-351
Number of pages	13
Journal	Computational Statistics and Data Analysis
Volume	58
Issue number	1
DOIs	https://doi.org/10.1016/j.csda.2012.09.007
State	Published - Feb 2013

Bibliographical note

Funding Information:
This work was supported in part by NSF grant DMS-0846068 . The authors thank the editor, AE and referees for their helpful comments. Appendix Proof of Lemma 1 We start with Stein’s identity ( □ Efron, 2004 ) (A.1) ( σ ˆ i j − σ i j ) 2 = ( σ ˆ i j − σ ̃ i j s ) 2 − ( σ ̃ i j s − σ i j ) 2 + 2 ( σ ˆ i j − σ i j ) ( σ ̃ i j s − σ i j ) . Taking expectation at both side of (A.1) and summing over i , j = 1 yield E ‖ Σ ̂ − Σ ‖ F 2 = E ‖ Σ ̂ − Σ ̃ s ‖ F 2 − ∑ i = 1 p ∑ j = 1 p var ( σ ̃ i j s ) + 2 ∑ i = 1 p ∑ j = 1 p cov ( σ ˆ i j , σ ̃ i j s ) . Note that E [ ( σ ˆ i j − σ i j ) ( σ ̃ i j s − σ i j ) ] = cov ( σ ˆ i j , σ ̃ i j s ) because E σ ̃ i j s = σ i j . Proof of Lemma 2 The estimators under consideration are translational invariant. Without loss of generality, we can let μ = E ( x ) = 0 . By straightforward calculation based on bivariate normal distribution, we have (A.2) E ( x i 2 x j 2 ) = σ i i σ j j + 2 σ i j 2 , which holds for both i = j and i ≠ j . (A.3) E ( ( σ ̃ i j s ) 2 ) = E ( ( n − 1 ) − 2 ( ∑ k = 1 n x k , i x k , j − n x ̄ i x ̄ j ) 2 ) = ( n − 1 ) − 2 { E ( ( ∑ k = 1 n x k , i x k , j ) 2 ) − 2 n − 1 ∑ k = 1 n E ( n x ̄ i n x ̄ j x k , i x k , j ) + n 2 E ( x ̄ i 2 x ̄ j 2 ) } . We also have (A.4) E ( ( n − 1 ∑ k = 1 n x k , i x k , j ) 2 ) = 1 n var ( x i x j ) + ( E ( x i x j ) ) 2 = 1 n ( σ i i σ j j + 2 σ i j 2 − σ i j 2 ) + σ i j 2 = 1 n σ i i σ j j + 1 + n n σ i j 2 . Note that X ̄ ∼ N ( 0 , Σ / n ) . Using (A.2) we have (A.5) n 2 E ( x ̄ i 2 x ̄ j 2 ) = 2 σ i j 2 + σ i i σ j j . (A.6) E ( n x ̄ i n x ̄ j x k , i x k , j ) = ∑ 1 ≤ l , l ′ ≤ n { I ( l = l ′ ≠ k ) E ( x l , i x l , j x k , i x k , j ) + I ( l = l ′ = k ) E ( x k , i 2 x k , j 2 ) } = ( n − 1 ) σ 12 2 + ( σ i i σ j j + 2 σ i j 2 ) . Substituting (A.4)–(A.6) into (A.3) gives (A.7) E ( ( σ ̃ i j s ) 2 ) = n σ i j 2 + σ i i σ j j n − 1 . Thus, var ( σ ̃ i j s ) = E ( ( σ ̃ i j s ) 2 ) − σ i j 2 = σ i j 2 + σ i i σ j j n − 1 . We now show □ (2.4) by deriving an expression for E ( σ ̃ i i s σ ̃ j j s ) . (A.8) ( n − 1 ) 2 E ( σ ̃ i i s σ ̃ j j s ) = ∑ 1 ≤ k , k ′ ≤ n E ( x k , i 2 x k ′ , j 2 ) − ∑ 1 ≤ k ′ ≤ n E ( x ̄ i 2 x k ′ , j 2 ) − ∑ 1 ≤ k ≤ n E ( x ̄ j 2 x k , i 2 ) + n 2 E ( x ̄ i 2 x ̄ j 2 ) . Repeatedly using (A.2) we have (A.9) ∑ 1 ≤ k , k ′ ≤ n E ( x k , i 2 x k ′ , j 2 ) = n 2 σ i i σ j j + 2 n σ i j 2 , (A.10) n 2 E ( x ̄ i 2 x k ′ , j 2 ) = ∑ 1 ≤ l , l ′ ≤ n { I ( l = l ′ ≠ k ′ ) E ( x l , i 2 x k ′ , j 2 ) + I ( l = l ′ = k ′ ) E ( x k ′ , i 2 x k ′ , j 2 ) } = n σ i i σ j j + 2 σ i j 2 , (A.11) n 2 E ( x ̄ j 2 x k , i 2 ) = n σ i i σ j j + 2 σ i j 2 . Substituting (A.5) and (A.9)–(A.11) into (A.8) gives (A.12) E ( σ ̃ i i s σ ̃ j j s ) = n + 1 n − 1 σ i i σ j j + 2 ( n + 2 ) n ( n − 1 ) σ i j 2 . Combining (A.7) and (A.12) gives (2.4) .

Keywords

Covariance matrix
Cross-validation
Frobenius norm
Operator norms
SURE
Tapering estimator

Access

10.1016/j.csda.2012.09.007

OpenUrl availability

Full text

Cite this

@article{91e8d6b7f8884e4b94ddfdfa0ff9eff5,

title = "SURE-tuned tapering estimation of large covariance matrices",

abstract = "Bandable covariance matrices are often used to model the dependence structure of variables that follow a nature order. It has been shown that the tapering covariance estimator attains the optimal minimax rates of convergence for estimating large bandable covariance matrices. The estimation risk critically depends on the choice of the tapering parameter. We develop a Stein's Unbiased Risk Estimation (SURE) theory for estimating the Frobenius risk of the tapering estimator. SURE tuning selects the minimizer of SURE curve as the chosen tapering parameter. An extensive Monte Carlo study shows that SURE tuning is often comparable to the oracle tuning and outperforms cross-validation. We further illustrate SURE tuning using rock sonar spectrum data. The real data analysis results are consistent with simulation findings.",

keywords = "Covariance matrix, Cross-validation, Frobenius norm, Operator norms, SURE, Tapering estimator",

author = "Feng Yi and Hui Zou",

note = "Funding Information: This work was supported in part by NSF grant DMS-0846068 . The authors thank the editor, AE and referees for their helpful comments. Appendix Proof of Lemma 1 We start with Stein{\textquoteright}s identity ( □ Efron, 2004 ) (A.1) ( σ ˆ i j − σ i j ) 2 = ( σ ˆ i j − σ{\~ } i j s ) 2 − ( σ{\~ } i j s − σ i j ) 2 + 2 ( σ ˆ i j − σ i j ) ( σ{\~ } i j s − σ i j ) . Taking expectation at both side of (A.1) and summing over i , j = 1 yield E ‖ Σ{\^ } − Σ ‖ F 2 = E ‖ Σ{\^ } − Σ{\~ } s ‖ F 2 − ∑ i = 1 p ∑ j = 1 p var ( σ{\~ } i j s ) + 2 ∑ i = 1 p ∑ j = 1 p cov ( σ ˆ i j , σ{\~ } i j s ) . Note that E [ ( σ ˆ i j − σ i j ) ( σ{\~ } i j s − σ i j ) ] = cov ( σ ˆ i j , σ{\~ } i j s ) because E σ{\~ } i j s = σ i j . Proof of Lemma 2 The estimators under consideration are translational invariant. Without loss of generality, we can let μ = E ( x ) = 0 . By straightforward calculation based on bivariate normal distribution, we have (A.2) E ( x i 2 x j 2 ) = σ i i σ j j + 2 σ i j 2 , which holds for both i = j and i ≠ j . (A.3) E ( ( σ{\~ } i j s ) 2 ) = E ( ( n − 1 ) − 2 ( ∑ k = 1 n x k , i x k , j − n x{\= } i x{\= } j ) 2 ) = ( n − 1 ) − 2 { E ( ( ∑ k = 1 n x k , i x k , j ) 2 ) − 2 n − 1 ∑ k = 1 n E ( n x{\= } i n x{\= } j x k , i x k , j ) + n 2 E ( x{\= } i 2 x{\= } j 2 ) } . We also have (A.4) E ( ( n − 1 ∑ k = 1 n x k , i x k , j ) 2 ) = 1 n var ( x i x j ) + ( E ( x i x j ) ) 2 = 1 n ( σ i i σ j j + 2 σ i j 2 − σ i j 2 ) + σ i j 2 = 1 n σ i i σ j j + 1 + n n σ i j 2 . Note that X{\= } ∼ N ( 0 , Σ / n ) . Using (A.2) we have (A.5) n 2 E ( x{\= } i 2 x{\= } j 2 ) = 2 σ i j 2 + σ i i σ j j . (A.6) E ( n x{\= } i n x{\= } j x k , i x k , j ) = ∑ 1 ≤ l , l ′ ≤ n { I ( l = l ′ ≠ k ) E ( x l , i x l , j x k , i x k , j ) + I ( l = l ′ = k ) E ( x k , i 2 x k , j 2 ) } = ( n − 1 ) σ 12 2 + ( σ i i σ j j + 2 σ i j 2 ) . Substituting (A.4)–(A.6) into (A.3) gives (A.7) E ( ( σ{\~ } i j s ) 2 ) = n σ i j 2 + σ i i σ j j n − 1 . Thus, var ( σ{\~ } i j s ) = E ( ( σ{\~ } i j s ) 2 ) − σ i j 2 = σ i j 2 + σ i i σ j j n − 1 . We now show □ (2.4) by deriving an expression for E ( σ{\~ } i i s σ{\~ } j j s ) . (A.8) ( n − 1 ) 2 E ( σ{\~ } i i s σ{\~ } j j s ) = ∑ 1 ≤ k , k ′ ≤ n E ( x k , i 2 x k ′ , j 2 ) − ∑ 1 ≤ k ′ ≤ n E ( x{\= } i 2 x k ′ , j 2 ) − ∑ 1 ≤ k ≤ n E ( x{\= } j 2 x k , i 2 ) + n 2 E ( x{\= } i 2 x{\= } j 2 ) . Repeatedly using (A.2) we have (A.9) ∑ 1 ≤ k , k ′ ≤ n E ( x k , i 2 x k ′ , j 2 ) = n 2 σ i i σ j j + 2 n σ i j 2 , (A.10) n 2 E ( x{\= } i 2 x k ′ , j 2 ) = ∑ 1 ≤ l , l ′ ≤ n { I ( l = l ′ ≠ k ′ ) E ( x l , i 2 x k ′ , j 2 ) + I ( l = l ′ = k ′ ) E ( x k ′ , i 2 x k ′ , j 2 ) } = n σ i i σ j j + 2 σ i j 2 , (A.11) n 2 E ( x{\= } j 2 x k , i 2 ) = n σ i i σ j j + 2 σ i j 2 . Substituting (A.5) and (A.9)–(A.11) into (A.8) gives (A.12) E ( σ{\~ } i i s σ{\~ } j j s ) = n + 1 n − 1 σ i i σ j j + 2 ( n + 2 ) n ( n − 1 ) σ i j 2 . Combining (A.7) and (A.12) gives (2.4) . ",

year = "2013",

month = feb,

doi = "10.1016/j.csda.2012.09.007",

language = "English (US)",

volume = "58",

pages = "339--351",

journal = "Computational Statistics and Data Analysis",

issn = "0167-9473",

publisher = "Elsevier",

number = "1",

}

TY - JOUR

T1 - SURE-tuned tapering estimation of large covariance matrices

AU - Yi, Feng

AU - Zou, Hui

N1 - Funding Information: This work was supported in part by NSF grant DMS-0846068 . The authors thank the editor, AE and referees for their helpful comments. Appendix Proof of Lemma 1 We start with Stein’s identity ( □ Efron, 2004 ) (A.1) ( σ ˆ i j − σ i j ) 2 = ( σ ˆ i j − σ ̃ i j s ) 2 − ( σ ̃ i j s − σ i j ) 2 + 2 ( σ ˆ i j − σ i j ) ( σ ̃ i j s − σ i j ) . Taking expectation at both side of (A.1) and summing over i , j = 1 yield E ‖ Σ ̂ − Σ ‖ F 2 = E ‖ Σ ̂ − Σ ̃ s ‖ F 2 − ∑ i = 1 p ∑ j = 1 p var ( σ ̃ i j s ) + 2 ∑ i = 1 p ∑ j = 1 p cov ( σ ˆ i j , σ ̃ i j s ) . Note that E [ ( σ ˆ i j − σ i j ) ( σ ̃ i j s − σ i j ) ] = cov ( σ ˆ i j , σ ̃ i j s ) because E σ ̃ i j s = σ i j . Proof of Lemma 2 The estimators under consideration are translational invariant. Without loss of generality, we can let μ = E ( x ) = 0 . By straightforward calculation based on bivariate normal distribution, we have (A.2) E ( x i 2 x j 2 ) = σ i i σ j j + 2 σ i j 2 , which holds for both i = j and i ≠ j . (A.3) E ( ( σ ̃ i j s ) 2 ) = E ( ( n − 1 ) − 2 ( ∑ k = 1 n x k , i x k , j − n x ̄ i x ̄ j ) 2 ) = ( n − 1 ) − 2 { E ( ( ∑ k = 1 n x k , i x k , j ) 2 ) − 2 n − 1 ∑ k = 1 n E ( n x ̄ i n x ̄ j x k , i x k , j ) + n 2 E ( x ̄ i 2 x ̄ j 2 ) } . We also have (A.4) E ( ( n − 1 ∑ k = 1 n x k , i x k , j ) 2 ) = 1 n var ( x i x j ) + ( E ( x i x j ) ) 2 = 1 n ( σ i i σ j j + 2 σ i j 2 − σ i j 2 ) + σ i j 2 = 1 n σ i i σ j j + 1 + n n σ i j 2 . Note that X ̄ ∼ N ( 0 , Σ / n ) . Using (A.2) we have (A.5) n 2 E ( x ̄ i 2 x ̄ j 2 ) = 2 σ i j 2 + σ i i σ j j . (A.6) E ( n x ̄ i n x ̄ j x k , i x k , j ) = ∑ 1 ≤ l , l ′ ≤ n { I ( l = l ′ ≠ k ) E ( x l , i x l , j x k , i x k , j ) + I ( l = l ′ = k ) E ( x k , i 2 x k , j 2 ) } = ( n − 1 ) σ 12 2 + ( σ i i σ j j + 2 σ i j 2 ) . Substituting (A.4)–(A.6) into (A.3) gives (A.7) E ( ( σ ̃ i j s ) 2 ) = n σ i j 2 + σ i i σ j j n − 1 . Thus, var ( σ ̃ i j s ) = E ( ( σ ̃ i j s ) 2 ) − σ i j 2 = σ i j 2 + σ i i σ j j n − 1 . We now show □ (2.4) by deriving an expression for E ( σ ̃ i i s σ ̃ j j s ) . (A.8) ( n − 1 ) 2 E ( σ ̃ i i s σ ̃ j j s ) = ∑ 1 ≤ k , k ′ ≤ n E ( x k , i 2 x k ′ , j 2 ) − ∑ 1 ≤ k ′ ≤ n E ( x ̄ i 2 x k ′ , j 2 ) − ∑ 1 ≤ k ≤ n E ( x ̄ j 2 x k , i 2 ) + n 2 E ( x ̄ i 2 x ̄ j 2 ) . Repeatedly using (A.2) we have (A.9) ∑ 1 ≤ k , k ′ ≤ n E ( x k , i 2 x k ′ , j 2 ) = n 2 σ i i σ j j + 2 n σ i j 2 , (A.10) n 2 E ( x ̄ i 2 x k ′ , j 2 ) = ∑ 1 ≤ l , l ′ ≤ n { I ( l = l ′ ≠ k ′ ) E ( x l , i 2 x k ′ , j 2 ) + I ( l = l ′ = k ′ ) E ( x k ′ , i 2 x k ′ , j 2 ) } = n σ i i σ j j + 2 σ i j 2 , (A.11) n 2 E ( x ̄ j 2 x k , i 2 ) = n σ i i σ j j + 2 σ i j 2 . Substituting (A.5) and (A.9)–(A.11) into (A.8) gives (A.12) E ( σ ̃ i i s σ ̃ j j s ) = n + 1 n − 1 σ i i σ j j + 2 ( n + 2 ) n ( n − 1 ) σ i j 2 . Combining (A.7) and (A.12) gives (2.4) .

PY - 2013/2

Y1 - 2013/2

N2 - Bandable covariance matrices are often used to model the dependence structure of variables that follow a nature order. It has been shown that the tapering covariance estimator attains the optimal minimax rates of convergence for estimating large bandable covariance matrices. The estimation risk critically depends on the choice of the tapering parameter. We develop a Stein's Unbiased Risk Estimation (SURE) theory for estimating the Frobenius risk of the tapering estimator. SURE tuning selects the minimizer of SURE curve as the chosen tapering parameter. An extensive Monte Carlo study shows that SURE tuning is often comparable to the oracle tuning and outperforms cross-validation. We further illustrate SURE tuning using rock sonar spectrum data. The real data analysis results are consistent with simulation findings.

AB - Bandable covariance matrices are often used to model the dependence structure of variables that follow a nature order. It has been shown that the tapering covariance estimator attains the optimal minimax rates of convergence for estimating large bandable covariance matrices. The estimation risk critically depends on the choice of the tapering parameter. We develop a Stein's Unbiased Risk Estimation (SURE) theory for estimating the Frobenius risk of the tapering estimator. SURE tuning selects the minimizer of SURE curve as the chosen tapering parameter. An extensive Monte Carlo study shows that SURE tuning is often comparable to the oracle tuning and outperforms cross-validation. We further illustrate SURE tuning using rock sonar spectrum data. The real data analysis results are consistent with simulation findings.

KW - Covariance matrix

KW - Cross-validation

KW - Frobenius norm

KW - Operator norms

KW - SURE

KW - Tapering estimator

UR - http://www.scopus.com/inward/record.url?scp=84869091735&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84869091735&partnerID=8YFLogxK

U2 - 10.1016/j.csda.2012.09.007

DO - 10.1016/j.csda.2012.09.007

M3 - Article

AN - SCOPUS:84869091735

SN - 0167-9473

VL - 58

SP - 339

EP - 351

JO - Computational Statistics and Data Analysis

JF - Computational Statistics and Data Analysis

IS - 1

ER -

SURE-tuned tapering estimation of large covariance matrices

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this