We consider the task of classification in the high dimensional setting where the number of features of the given data is significantly greater than the number of observations. To accomplish this task, we propose a heuristic, called sparse zero-variance discriminant analysis, for simultaneously performing linear discriminant analysis and feature selection on high dimensional data. This method combines classical zero-variance discriminant analysis, where discriminant vectors are identified in the null space of the sample within-class covariance matrix, with penalization applied to induce sparse structures in the resulting vectors. To approximately solve the resulting nonconvex problem, we develop a simple algorithm based on the alternating direction method of multipliers. Further, we show that this algorithm is applicable to a larger class of penalized generalized eigenvalue problems, including a particular relaxation of the sparse principal component analysis problem. Finally, we establish theoretical guarantees for convergence of our algorithm to stationary points of the original nonconvex problem, and empirically demonstrate the effectiveness of our heuristic for classifying simulated data and data drawn from applications in time-series classification.
Bibliographical noteFunding Information:
The work presented in this paper was partially carried out while B.P.W Ames was a postdoctoral fellow at the Institute for Mathematics and its Applications during the IMA’s annual program on Mathematics of Information (supported by National Science Foundation (NSF) award DMS-0931945), and while B.P.W. Ames was a Von Karman instructor at the California Institute of Technology supported by Joel Tropp under Office of Naval Research (ONR) award N00014-11-1002. This research was also supported by University of Alabama Research Grant RG14678. We are grateful to Fadil Santosa, Krystal Taylor, Zhi-Quan Luo, Meisam Razaviyayn, and Line Clemmensen for their insights and helpful suggestions. Finally, we are grateful for the contributions of two anonymous reviewers whose suggestions have greatly improved this manuscript.
© 2016, Springer Science+Business Media New York.
- Alternating direction method of multipliers
- Dimension reduction
- Feature selection
- Linear discriminant analysis
- Nonconvex optimization