TY - JOUR
T1 - lp-Recovery of the Most Significant Subspace Among Multiple Subspaces with Outliers
AU - Lerman, Gilad
AU - Zhang, Teng
N1 - Publisher Copyright:
© 2014, Springer Science+Business Media New York.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2014/12
Y1 - 2014/12
N2 - We assume data sampled from a mixture of d-dimensional linear subspaces with spherically symmetric distributions within each subspace and an additional outlier component with spherically symmetric distribution within the ambient space (for simplicity, we may assume that all distributions are uniform on their corresponding unit spheres). We also assume mixture weights for the different components. We say that one of the underlying subspaces of the model is most significant if its mixture weight is higher than the sum of the mixture weights of all other subspaces. We study the recovery of the most significant subspace by minimizing the lp-averaged distances of data points from d-dimensional subspaces of RD, where 0 < p ∈ R. Unlike other lpminimization problems, this minimization is nonconvex for all p>0 and thus requires different methods for its analysis. We show that if 0 < p ≤ 1, then for any fraction of outliers, the most significant subspace can be recovered by lpminimization with overwhelming probability (which depends on the generating distribution and its parameters). We show that when adding small noise around the underlying subspaces, the most significant subspace can be nearly recovered by lpminimization for any 0 < p ≤ 1 with an error proportional to the noise level. On the other hand, if p > 1 and there is more than one underlying subspace, then with overwhelming probability the most significant subspace cannot be recovered or nearly recovered. This last result does not require spherically symmetric outliers.
AB - We assume data sampled from a mixture of d-dimensional linear subspaces with spherically symmetric distributions within each subspace and an additional outlier component with spherically symmetric distribution within the ambient space (for simplicity, we may assume that all distributions are uniform on their corresponding unit spheres). We also assume mixture weights for the different components. We say that one of the underlying subspaces of the model is most significant if its mixture weight is higher than the sum of the mixture weights of all other subspaces. We study the recovery of the most significant subspace by minimizing the lp-averaged distances of data points from d-dimensional subspaces of RD, where 0 < p ∈ R. Unlike other lpminimization problems, this minimization is nonconvex for all p>0 and thus requires different methods for its analysis. We show that if 0 < p ≤ 1, then for any fraction of outliers, the most significant subspace can be recovered by lpminimization with overwhelming probability (which depends on the generating distribution and its parameters). We show that when adding small noise around the underlying subspaces, the most significant subspace can be nearly recovered by lpminimization for any 0 < p ≤ 1 with an error proportional to the noise level. On the other hand, if p > 1 and there is more than one underlying subspace, then with overwhelming probability the most significant subspace cannot be recovered or nearly recovered. This last result does not require spherically symmetric outliers.
KW - Best approximating subspace
KW - Geometric probability
KW - Hybrid linear modeling
KW - Optimization on the Grassmannian
KW - Robust statistics
KW - lminimization
UR - http://www.scopus.com/inward/record.url?scp=84919903045&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84919903045&partnerID=8YFLogxK
U2 - 10.1007/s00365-014-9242-6
DO - 10.1007/s00365-014-9242-6
M3 - Article
AN - SCOPUS:84919903045
VL - 40
SP - 329
EP - 385
JO - Constructive Approximation
JF - Constructive Approximation
SN - 0176-4276
IS - 3
ER -