Abstract
Penalized model-based clustering has been proposed for high-dimensional but small sample-sized data, such as arising from genomic studies; in particular, it can be used for variable selection. A new regularization scheme is proposed to group together multiple parameters of the same variable across clusters, which is shown both analytically and numerically to be more effective than the conventional L1 penalty for variable selection. In addition, we develop a strategy to combine this grouping scheme with grouping structured variables. Simulation studies and applications to microarray gene expression data for cancer subtype discovery demonstrate the advantage of the new proposal over several existing approaches.
Original language | English (US) |
---|---|
Pages (from-to) | 921-930 |
Number of pages | 10 |
Journal | Biometrics |
Volume | 64 |
Issue number | 3 |
DOIs | |
State | Published - Sep 2008 |
Keywords
- BIC
- Diagonal covariance
- EM algorithm
- High-dimension but low-sample size
- Microarray gene expression
- Mixture model
- Penalized likelihood