lp-Recovery of the Most Significant Subspace Among Multiple Subspaces with Outliers

Gilad Lerman; Teng Zhang

doi:10.1007/s00365-014-9242-6

l_p-Recovery of the Most Significant Subspace Among Multiple Subspaces with Outliers

Gilad Lerman, Teng Zhang

Mathematics

Research output: Contribution to journal › Article › peer-review

21 Scopus citations

Abstract

We assume data sampled from a mixture of d-dimensional linear subspaces with spherically symmetric distributions within each subspace and an additional outlier component with spherically symmetric distribution within the ambient space (for simplicity, we may assume that all distributions are uniform on their corresponding unit spheres). We also assume mixture weights for the different components. We say that one of the underlying subspaces of the model is most significant if its mixture weight is higher than the sum of the mixture weights of all other subspaces. We study the recovery of the most significant subspace by minimizing the l_p-averaged distances of data points from d-dimensional subspaces of R^D, where 0 0 and thus requires different methods for its analysis. We show that if 0 < p ≤ 1, then for any fraction of outliers, the most significant subspace can be recovered by l_pminimization with overwhelming probability (which depends on the generating distribution and its parameters). We show that when adding small noise around the underlying subspaces, the most significant subspace can be nearly recovered by l_pminimization for any 0 1 and there is more than one underlying subspace, then with overwhelming probability the most significant subspace cannot be recovered or nearly recovered. This last result does not require spherically symmetric outliers.

Original language	English (US)
Pages (from-to)	329-385
Number of pages	57
Journal	Constructive Approximation
Volume	40
Issue number	3
DOIs	https://doi.org/10.1007/s00365-014-9242-6
State	Published - Dec 2014

Bibliographical note

Publisher Copyright:
© 2014, Springer Science+Business Media New York.

Keywords

Best approximating subspace
Geometric probability
Hybrid linear modeling
Optimization on the Grassmannian
Robust statistics
lminimization

Access

10.1007/s00365-014-9242-6

OpenUrl availability

Full text

Cite this

@article{bbd7088358424c3391ca59a6b25d401b,

title = "lp-Recovery of the Most Significant Subspace Among Multiple Subspaces with Outliers",

abstract = "We assume data sampled from a mixture of d-dimensional linear subspaces with spherically symmetric distributions within each subspace and an additional outlier component with spherically symmetric distribution within the ambient space (for simplicity, we may assume that all distributions are uniform on their corresponding unit spheres). We also assume mixture weights for the different components. We say that one of the underlying subspaces of the model is most significant if its mixture weight is higher than the sum of the mixture weights of all other subspaces. We study the recovery of the most significant subspace by minimizing the lp-averaged distances of data points from d-dimensional subspaces of RD, where 0 0 and thus requires different methods for its analysis. We show that if 0 < p ≤ 1, then for any fraction of outliers, the most significant subspace can be recovered by lpminimization with overwhelming probability (which depends on the generating distribution and its parameters). We show that when adding small noise around the underlying subspaces, the most significant subspace can be nearly recovered by lpminimization for any 0 1 and there is more than one underlying subspace, then with overwhelming probability the most significant subspace cannot be recovered or nearly recovered. This last result does not require spherically symmetric outliers.",

keywords = "Best approximating subspace, Geometric probability, Hybrid linear modeling, Optimization on the Grassmannian, Robust statistics, lminimization",

author = "Gilad Lerman and Teng Zhang",

note = "Publisher Copyright: {\textcopyright} 2014, Springer Science+Business Media New York.",

year = "2014",

month = dec,

doi = "10.1007/s00365-014-9242-6",

language = "English (US)",

volume = "40",

pages = "329--385",

journal = "Constructive Approximation",

issn = "0176-4276",

publisher = "Springer New York",

number = "3",

}

TY - JOUR

T1 - lp-Recovery of the Most Significant Subspace Among Multiple Subspaces with Outliers

AU - Lerman, Gilad

AU - Zhang, Teng

PY - 2014/12

Y1 - 2014/12

N2 - We assume data sampled from a mixture of d-dimensional linear subspaces with spherically symmetric distributions within each subspace and an additional outlier component with spherically symmetric distribution within the ambient space (for simplicity, we may assume that all distributions are uniform on their corresponding unit spheres). We also assume mixture weights for the different components. We say that one of the underlying subspaces of the model is most significant if its mixture weight is higher than the sum of the mixture weights of all other subspaces. We study the recovery of the most significant subspace by minimizing the lp-averaged distances of data points from d-dimensional subspaces of RD, where 0 0 and thus requires different methods for its analysis. We show that if 0 < p ≤ 1, then for any fraction of outliers, the most significant subspace can be recovered by lpminimization with overwhelming probability (which depends on the generating distribution and its parameters). We show that when adding small noise around the underlying subspaces, the most significant subspace can be nearly recovered by lpminimization for any 0 1 and there is more than one underlying subspace, then with overwhelming probability the most significant subspace cannot be recovered or nearly recovered. This last result does not require spherically symmetric outliers.

AB - We assume data sampled from a mixture of d-dimensional linear subspaces with spherically symmetric distributions within each subspace and an additional outlier component with spherically symmetric distribution within the ambient space (for simplicity, we may assume that all distributions are uniform on their corresponding unit spheres). We also assume mixture weights for the different components. We say that one of the underlying subspaces of the model is most significant if its mixture weight is higher than the sum of the mixture weights of all other subspaces. We study the recovery of the most significant subspace by minimizing the lp-averaged distances of data points from d-dimensional subspaces of RD, where 0 0 and thus requires different methods for its analysis. We show that if 0 < p ≤ 1, then for any fraction of outliers, the most significant subspace can be recovered by lpminimization with overwhelming probability (which depends on the generating distribution and its parameters). We show that when adding small noise around the underlying subspaces, the most significant subspace can be nearly recovered by lpminimization for any 0 1 and there is more than one underlying subspace, then with overwhelming probability the most significant subspace cannot be recovered or nearly recovered. This last result does not require spherically symmetric outliers.

KW - Best approximating subspace

KW - Geometric probability

KW - Hybrid linear modeling

KW - Optimization on the Grassmannian

KW - Robust statistics

KW - lminimization

UR - http://www.scopus.com/inward/record.url?scp=84919903045&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84919903045&partnerID=8YFLogxK

U2 - 10.1007/s00365-014-9242-6

DO - 10.1007/s00365-014-9242-6

M3 - Article

AN - SCOPUS:84919903045

SN - 0176-4276

VL - 40

SP - 329

EP - 385

JO - Constructive Approximation

JF - Constructive Approximation

IS - 3

ER -