TY - GEN

T1 - A divide-and-conquer procedure for sparse inverse covariance estimation

AU - Hsieh, Cho Jui

AU - Dhillon, Inderjit S.

AU - Ravikumar, Pradeep

AU - Banerjee, Arindam

PY - 2012/12/1

Y1 - 2012/12/1

N2 - We consider the composite log-determinant optimization problem, arising from the ℓ1 regularized Gaussian maximum likelihood estimator of a sparse inverse covariance matrix, in a high-dimensional setting with a very large number of variables. Recent work has shown this estimator to have strong statistical guarantees in recovering the true structure of the sparse inverse covariance matrix, or alternatively the underlying graph structure of the corresponding Gaussian Markov Random Field, even in very high-dimensional regimes with a limited number of samples. In this paper, we are concerned with the computational cost in solving the above optimization problem. Our proposed algorithm partitions the problem into smaller sub-problems, and uses the solutions of the sub-problems to build a good approximation for the original problem. Our key idea for the divide step to obtain a sub-problem partition is as follows: we first derive a tractable bound on the quality of the approximate solution obtained from solving the corresponding sub-divided problems. Based on this bound, we propose a clustering algorithm that attempts to minimize this bound, in order to find effective partitions of the variables. For the conquer step, we use the approximate solution, i.e., solution resulting from solving the sub-problems, as an initial point to solve the original problem, and thereby achieve a much faster computational procedure.

AB - We consider the composite log-determinant optimization problem, arising from the ℓ1 regularized Gaussian maximum likelihood estimator of a sparse inverse covariance matrix, in a high-dimensional setting with a very large number of variables. Recent work has shown this estimator to have strong statistical guarantees in recovering the true structure of the sparse inverse covariance matrix, or alternatively the underlying graph structure of the corresponding Gaussian Markov Random Field, even in very high-dimensional regimes with a limited number of samples. In this paper, we are concerned with the computational cost in solving the above optimization problem. Our proposed algorithm partitions the problem into smaller sub-problems, and uses the solutions of the sub-problems to build a good approximation for the original problem. Our key idea for the divide step to obtain a sub-problem partition is as follows: we first derive a tractable bound on the quality of the approximate solution obtained from solving the corresponding sub-divided problems. Based on this bound, we propose a clustering algorithm that attempts to minimize this bound, in order to find effective partitions of the variables. For the conquer step, we use the approximate solution, i.e., solution resulting from solving the sub-problems, as an initial point to solve the original problem, and thereby achieve a much faster computational procedure.

UR - http://www.scopus.com/inward/record.url?scp=84877783090&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84877783090&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84877783090

SN - 9781627480031

T3 - Advances in Neural Information Processing Systems

SP - 2330

EP - 2338

BT - Advances in Neural Information Processing Systems 25

T2 - 26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012

Y2 - 3 December 2012 through 6 December 2012

ER -