Generalized thresholding of large covariance matrices

Adam J. Rothman; Elizaveta Levina; Ji Zhu

doi:10.1198/jasa.2009.0101

Generalized thresholding of large covariance matrices

Adam J. Rothman, Elizaveta Levina, Ji Zhu

Statistics (Twin Cities)

Research output: Contribution to journal › Article › peer-review

304 Scopus citations

Abstract

We propose a new class of generalized thresholding operators that combine thresholding with shrinkage, and study generalized thresholding of the sample covariance matrix in high dimensions. Generalized thresholding of the covariance matrix has good theoretical properties and carries almost no computational burden. We obtain an explicit convergence rate in the operator norm that shows the tradeoff between the sparsity of the true model, dimension, and the sample size, and shows that generalized thresholding is consistent over a large class of models as long as the dimension p and the sample size n satisfy log p/n → O. In addition, we show that generalized thresholding has the "sparsistency" property, meaning it estimates true zeros as zeros with probability tending to 1, and, under an additional mild condition, is sign consistent for nonzero elements. We show that generalized thresholding covers, as special cases, hard and soft thresholding, smoothly clipped absolute deviation, and adaptive lasso, and compare different types of generalized thresholding in a simulation study and in an example of gene clustering from a microarray experiment with tumor tissues.

Original language	English (US)
Pages (from-to)	177-186
Number of pages	10
Journal	Journal of the American Statistical Association
Volume	104
Issue number	485
DOIs	https://doi.org/10.1198/jasa.2009.0101
State	Published - Mar 2009

Bibliographical note

Funding Information:
Adam J. Rothman is a Ph.D. candidate, Department of Statistics, University of Michigan, Ann Arbor, MI 48109-1107 (E-mail: ajrothma@umich.edu). Elizaveta Levina is Assistant Professor, Department of Statistics, University of Michigan, Ann Arbor, MI 48109-1107 (E-mail: elevina@umich.edu). Ji Zhu is Associate Professor Department of Statistics, University of Michigan, Ann Arbor, MI 48109-1107 (E-mail: jizhu@umich.edu). Elizaveta Levina’s research is supported in part by grants from the National Science Foundation (NSF; DMS-0505424 and DMS-0805798). Ji Zhu’s research is supported in part by grants from the NSF (DMS-0505432 and DMS-0705532). The authors thank an Associate Editor and two referees for helpful suggestions

Copyright:
Copyright 2009 Elsevier B.V., All rights reserved.

Keywords

Covariance
High-dimensional data
Regularization
Sparsity
Thresholding

Access

10.1198/jasa.2009.0101

OpenUrl availability

Full text

Cite this

@article{0a2aeeec6ba14c7ead371a8e74b57c52,

title = "Generalized thresholding of large covariance matrices",

abstract = "We propose a new class of generalized thresholding operators that combine thresholding with shrinkage, and study generalized thresholding of the sample covariance matrix in high dimensions. Generalized thresholding of the covariance matrix has good theoretical properties and carries almost no computational burden. We obtain an explicit convergence rate in the operator norm that shows the tradeoff between the sparsity of the true model, dimension, and the sample size, and shows that generalized thresholding is consistent over a large class of models as long as the dimension p and the sample size n satisfy log p/n → O. In addition, we show that generalized thresholding has the {"}sparsistency{"} property, meaning it estimates true zeros as zeros with probability tending to 1, and, under an additional mild condition, is sign consistent for nonzero elements. We show that generalized thresholding covers, as special cases, hard and soft thresholding, smoothly clipped absolute deviation, and adaptive lasso, and compare different types of generalized thresholding in a simulation study and in an example of gene clustering from a microarray experiment with tumor tissues.",

keywords = "Covariance, High-dimensional data, Regularization, Sparsity, Thresholding",

author = "Rothman, {Adam J.} and Elizaveta Levina and Ji Zhu",

note = "Funding Information: Adam J. Rothman is a Ph.D. candidate, Department of Statistics, University of Michigan, Ann Arbor, MI 48109-1107 (E-mail: ajrothma@umich.edu). Elizaveta Levina is Assistant Professor, Department of Statistics, University of Michigan, Ann Arbor, MI 48109-1107 (E-mail: elevina@umich.edu). Ji Zhu is Associate Professor Department of Statistics, University of Michigan, Ann Arbor, MI 48109-1107 (E-mail: jizhu@umich.edu). Elizaveta Levina{\textquoteright}s research is supported in part by grants from the National Science Foundation (NSF; DMS-0505424 and DMS-0805798). Ji Zhu{\textquoteright}s research is supported in part by grants from the NSF (DMS-0505432 and DMS-0705532). The authors thank an Associate Editor and two referees for helpful suggestions Copyright: Copyright 2009 Elsevier B.V., All rights reserved.",

year = "2009",

month = mar,

doi = "10.1198/jasa.2009.0101",

language = "English (US)",

volume = "104",

pages = "177--186",

journal = "Journal of the American Statistical Association",

issn = "0162-1459",

publisher = "Taylor and Francis Ltd.",

number = "485",

}

TY - JOUR

T1 - Generalized thresholding of large covariance matrices

AU - Rothman, Adam J.

AU - Levina, Elizaveta

AU - Zhu, Ji

N1 - Funding Information: Adam J. Rothman is a Ph.D. candidate, Department of Statistics, University of Michigan, Ann Arbor, MI 48109-1107 (E-mail: ajrothma@umich.edu). Elizaveta Levina is Assistant Professor, Department of Statistics, University of Michigan, Ann Arbor, MI 48109-1107 (E-mail: elevina@umich.edu). Ji Zhu is Associate Professor Department of Statistics, University of Michigan, Ann Arbor, MI 48109-1107 (E-mail: jizhu@umich.edu). Elizaveta Levina’s research is supported in part by grants from the National Science Foundation (NSF; DMS-0505424 and DMS-0805798). Ji Zhu’s research is supported in part by grants from the NSF (DMS-0505432 and DMS-0705532). The authors thank an Associate Editor and two referees for helpful suggestions Copyright: Copyright 2009 Elsevier B.V., All rights reserved.

PY - 2009/3

Y1 - 2009/3

N2 - We propose a new class of generalized thresholding operators that combine thresholding with shrinkage, and study generalized thresholding of the sample covariance matrix in high dimensions. Generalized thresholding of the covariance matrix has good theoretical properties and carries almost no computational burden. We obtain an explicit convergence rate in the operator norm that shows the tradeoff between the sparsity of the true model, dimension, and the sample size, and shows that generalized thresholding is consistent over a large class of models as long as the dimension p and the sample size n satisfy log p/n → O. In addition, we show that generalized thresholding has the "sparsistency" property, meaning it estimates true zeros as zeros with probability tending to 1, and, under an additional mild condition, is sign consistent for nonzero elements. We show that generalized thresholding covers, as special cases, hard and soft thresholding, smoothly clipped absolute deviation, and adaptive lasso, and compare different types of generalized thresholding in a simulation study and in an example of gene clustering from a microarray experiment with tumor tissues.

AB - We propose a new class of generalized thresholding operators that combine thresholding with shrinkage, and study generalized thresholding of the sample covariance matrix in high dimensions. Generalized thresholding of the covariance matrix has good theoretical properties and carries almost no computational burden. We obtain an explicit convergence rate in the operator norm that shows the tradeoff between the sparsity of the true model, dimension, and the sample size, and shows that generalized thresholding is consistent over a large class of models as long as the dimension p and the sample size n satisfy log p/n → O. In addition, we show that generalized thresholding has the "sparsistency" property, meaning it estimates true zeros as zeros with probability tending to 1, and, under an additional mild condition, is sign consistent for nonzero elements. We show that generalized thresholding covers, as special cases, hard and soft thresholding, smoothly clipped absolute deviation, and adaptive lasso, and compare different types of generalized thresholding in a simulation study and in an example of gene clustering from a microarray experiment with tumor tissues.

KW - Covariance

KW - High-dimensional data

KW - Regularization

KW - Sparsity

KW - Thresholding

UR - http://www.scopus.com/inward/record.url?scp=70350337963&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70350337963&partnerID=8YFLogxK

U2 - 10.1198/jasa.2009.0101

DO - 10.1198/jasa.2009.0101

M3 - Article

AN - SCOPUS:70350337963

SN - 0162-1459

VL - 104

SP - 177

EP - 186

JO - Journal of the American Statistical Association

JF - Journal of the American Statistical Association

IS - 485

ER -

Generalized thresholding of large covariance matrices

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this