A novel M-estimator for robust PCA

Teng Zhang; Gilad Lerman

A novel M-estimator for robust PCA

Teng Zhang, Gilad Lerman

Mathematics

Research output: Contribution to journal › Article › peer-review

88 Scopus citations

Abstract

We study the basic problem of robust subspace recovery. That is, we assume a data set that some of its points are sampled around a fixed subspace and the rest of them are spread in the whole ambient space, and we aim to recover the fixed underlying subspace. We first estimate "robust inverse sample covariance" by solving a convex minimization procedure; we then recover the subspace by the bottom eigenvectors of this matrix (their number correspond to the number of eigenvalues close to 0). We guarantee exact subspace recovery under some conditions on the underlying data. Furthermore, we propose a fast iterative algorithm, which linearly converges to the matrix minimizing the convex problem. We also quantify the effect of noise and regularization and discuss many other practical and theoretical issues for improving the subspace recovery in various settings. When replacing the sum of terms in the convex energy function (that we minimize) with the sum of squares of terms, we obtain that the new minimizer is a scaled version of the inverse sample covariance (when exists). We thus interpret our minimizer and its subspace (spanned by its bottom eigenvectors) as robust versions of the empirical inverse covariance and the PCA subspace respectively. We compare our method with many other algorithms for robust PCA on synthetic and real data sets and demonstrate state-of-the-art speed and accuracy.

Original language	English (US)
Pages (from-to)	749-808
Number of pages	60
Journal	Journal of Machine Learning Research
Volume	15
State	Published - Feb 2014

Keywords

Convex relaxation
Iteratively re-weighted least squares
M-estimator
Principal components analysis
Robust statistics

OpenUrl availability

Full text

Cite this

@article{3dab61c46bba4d8d96d8a2ce779b22d8,

title = "A novel M-estimator for robust PCA",

abstract = "We study the basic problem of robust subspace recovery. That is, we assume a data set that some of its points are sampled around a fixed subspace and the rest of them are spread in the whole ambient space, and we aim to recover the fixed underlying subspace. We first estimate {"}robust inverse sample covariance{"} by solving a convex minimization procedure; we then recover the subspace by the bottom eigenvectors of this matrix (their number correspond to the number of eigenvalues close to 0). We guarantee exact subspace recovery under some conditions on the underlying data. Furthermore, we propose a fast iterative algorithm, which linearly converges to the matrix minimizing the convex problem. We also quantify the effect of noise and regularization and discuss many other practical and theoretical issues for improving the subspace recovery in various settings. When replacing the sum of terms in the convex energy function (that we minimize) with the sum of squares of terms, we obtain that the new minimizer is a scaled version of the inverse sample covariance (when exists). We thus interpret our minimizer and its subspace (spanned by its bottom eigenvectors) as robust versions of the empirical inverse covariance and the PCA subspace respectively. We compare our method with many other algorithms for robust PCA on synthetic and real data sets and demonstrate state-of-the-art speed and accuracy.",

keywords = "Convex relaxation, Iteratively re-weighted least squares, M-estimator, Principal components analysis, Robust statistics",

author = "Teng Zhang and Gilad Lerman",

year = "2014",

month = feb,

language = "English (US)",

volume = "15",

pages = "749--808",

journal = "Journal of Machine Learning Research",

issn = "1532-4435",

publisher = "Microtome Publishing",

}

TY - JOUR

T1 - A novel M-estimator for robust PCA

AU - Zhang, Teng

AU - Lerman, Gilad

PY - 2014/2

Y1 - 2014/2

N2 - We study the basic problem of robust subspace recovery. That is, we assume a data set that some of its points are sampled around a fixed subspace and the rest of them are spread in the whole ambient space, and we aim to recover the fixed underlying subspace. We first estimate "robust inverse sample covariance" by solving a convex minimization procedure; we then recover the subspace by the bottom eigenvectors of this matrix (their number correspond to the number of eigenvalues close to 0). We guarantee exact subspace recovery under some conditions on the underlying data. Furthermore, we propose a fast iterative algorithm, which linearly converges to the matrix minimizing the convex problem. We also quantify the effect of noise and regularization and discuss many other practical and theoretical issues for improving the subspace recovery in various settings. When replacing the sum of terms in the convex energy function (that we minimize) with the sum of squares of terms, we obtain that the new minimizer is a scaled version of the inverse sample covariance (when exists). We thus interpret our minimizer and its subspace (spanned by its bottom eigenvectors) as robust versions of the empirical inverse covariance and the PCA subspace respectively. We compare our method with many other algorithms for robust PCA on synthetic and real data sets and demonstrate state-of-the-art speed and accuracy.

AB - We study the basic problem of robust subspace recovery. That is, we assume a data set that some of its points are sampled around a fixed subspace and the rest of them are spread in the whole ambient space, and we aim to recover the fixed underlying subspace. We first estimate "robust inverse sample covariance" by solving a convex minimization procedure; we then recover the subspace by the bottom eigenvectors of this matrix (their number correspond to the number of eigenvalues close to 0). We guarantee exact subspace recovery under some conditions on the underlying data. Furthermore, we propose a fast iterative algorithm, which linearly converges to the matrix minimizing the convex problem. We also quantify the effect of noise and regularization and discuss many other practical and theoretical issues for improving the subspace recovery in various settings. When replacing the sum of terms in the convex energy function (that we minimize) with the sum of squares of terms, we obtain that the new minimizer is a scaled version of the inverse sample covariance (when exists). We thus interpret our minimizer and its subspace (spanned by its bottom eigenvectors) as robust versions of the empirical inverse covariance and the PCA subspace respectively. We compare our method with many other algorithms for robust PCA on synthetic and real data sets and demonstrate state-of-the-art speed and accuracy.

KW - Convex relaxation

KW - Iteratively re-weighted least squares

KW - M-estimator

KW - Principal components analysis

KW - Robust statistics

UR - http://www.scopus.com/inward/record.url?scp=84897049186&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84897049186&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84897049186

SN - 1532-4435

VL - 15

SP - 749

EP - 808

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

ER -

A novel M-estimator for robust PCA

Abstract

Keywords

OpenUrl availability

Other files and links

Fingerprint

Cite this