Abstract
Distance-weighted discrimination (DWD) is a modern margin-based classifier with an interesting geometric motivation. It was proposed as a competitor to the support vector machine (SVM). Despite many recent references on DWD, DWD is far less popular than the SVM, mainly because of computational and theoretical reasons. We greatly advance the current DWD methodology and its learning theory. We propose a novel thrifty algorithm for solving standard DWD and generalized DWD, and our algorithm can be several hundred times faster than the existing state of the art algorithm based on second-order cone programming. In addition, we exploit the new algorithm to design an efficient scheme to tune generalized DWD. Furthermore, we formulate a natural kernel DWD approach in a reproducing kernel Hilbert space and then establish the Bayes risk consistency of the kernel DWD by using a universal kernel such as the Gaussian kernel. This result solves an open theoretical problem in the DWD literature. A comparison study on 16 benchmark data sets shows that data-driven generalized DWD consistently delivers higher classification accuracy with less computation time than the SVM.
Original language | English (US) |
---|---|
Pages (from-to) | 177-198 |
Number of pages | 22 |
Journal | Journal of the Royal Statistical Society. Series B: Statistical Methodology |
Volume | 80 |
Issue number | 1 |
DOIs | |
State | Published - Jan 2018 |
Bibliographical note
Funding Information:We thank the Joint Editor, an Associate Editor and three referees for their helpful comments that greatly improved this work. We thank Professor Defeng Sun and Professor Kim-Chuan Toh for sharing the MATLAB toolbox DWDLarge that implements the inexact symmetric Gauss–Seidel alternating direction method of multipliers algorithm. Zou’s research was partially supported by National Science Foundation grant DMS-1505111.
Publisher Copyright:
© 2017 Royal Statistical Society
Keywords
- Bayes risk consistency
- Classification
- Distance-weighted discrimination
- Kernel learning
- Majorization–minimization principle
- Second-order cone programming