Abstract
Variable selection in clustering analysis is both challenging and important. In the context of model-based clustering analysis with a common diagonal covariance matrix, which is especially suitable for "high dimension, low sample size" settings, we propose a penalized likelihood approach with an L1 penalty function, automatically realizing variable selection via thresholding and delivering a sparse solution. We derive an EM algorithm to fit our proposed model, and propose a modified BIC as a model selection criterion to choose the number of components and the penalization parameter. A simulation study and an application to gene function prediction with gene expression profiles demonstrate the utility of our method.
Original language | English (US) |
---|---|
Pages (from-to) | 1145-1164 |
Number of pages | 20 |
Journal | Journal of Machine Learning Research |
Volume | 8 |
State | Published - May 2007 |
Keywords
- BIC
- EM
- Mixture model
- Penalized likelihood
- Shrinkage
- Soft-thresholding