TY - JOUR
T1 - Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization
AU - Li, Xingguo
AU - Lu, Junwei
AU - Arora, Raman
AU - Haupt, Jarvis
AU - Liu, Han
AU - Wang, Zhaoran
AU - Zhao, Tuo
N1 - Publisher Copyright:
© 1963-2012 IEEE.
PY - 2019/6
Y1 - 2019/6
N2 - We propose a general theory for studying the landscape of nonconvex optimization with underlying symmetric structures for a class of machine learning problems (e.g., low-rank matrix factorization, phase retrieval, and deep linear neural networks). In particular, we characterize the locations of stationary points and the null space of Hessian matrices of the objective function via the lens of invariant groups. As a major motivating example, we apply the proposed general theory to characterize the global landscape of the nonconvex optimization in low-rank matrix factorization problem. We illustrate how the rotational symmetry group gives rise to infinitely many nonisolated strict saddle points and equivalent global minima of the objective function. By explicitly identifying all stationary points, we divide the entire parameter space into three regions: (R1) the region containing the neighborhoods of all strict saddle points where the objective has negative curvature; (R2) the region containing neighborhoods of all global minima, where the objective enjoys strong convexity along certain directions; and (R3) the complement of the above regions, where the gradient has sufficiently large magnitude. We further extend our result to the matrix sensing problem. Such global landscape implies that strong global convergence guarantees for popular iterative algorithms with arbitrary initial solutions.
AB - We propose a general theory for studying the landscape of nonconvex optimization with underlying symmetric structures for a class of machine learning problems (e.g., low-rank matrix factorization, phase retrieval, and deep linear neural networks). In particular, we characterize the locations of stationary points and the null space of Hessian matrices of the objective function via the lens of invariant groups. As a major motivating example, we apply the proposed general theory to characterize the global landscape of the nonconvex optimization in low-rank matrix factorization problem. We illustrate how the rotational symmetry group gives rise to infinitely many nonisolated strict saddle points and equivalent global minima of the objective function. By explicitly identifying all stationary points, we divide the entire parameter space into three regions: (R1) the region containing the neighborhoods of all strict saddle points where the objective has negative curvature; (R2) the region containing neighborhoods of all global minima, where the objective enjoys strong convexity along certain directions; and (R3) the complement of the above regions, where the gradient has sufficiently large magnitude. We further extend our result to the matrix sensing problem. Such global landscape implies that strong global convergence guarantees for popular iterative algorithms with arbitrary initial solutions.
KW - Strict saddle problem
KW - global landscape
KW - invariant group
KW - matrix sensing
KW - nonconvex matrix factorization
UR - http://www.scopus.com/inward/record.url?scp=85065984137&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85065984137&partnerID=8YFLogxK
U2 - 10.1109/TIT.2019.2898663
DO - 10.1109/TIT.2019.2898663
M3 - Article
AN - SCOPUS:85065984137
SN - 0018-9448
VL - 65
SP - 3489
EP - 3514
JO - IEEE Transactions on Information Theory
JF - IEEE Transactions on Information Theory
IS - 6
M1 - 8675509
ER -