With an increasing number of publicly available microarray datasets, it becomes attractive to borrow information from other relevant studies to have more reliable and powerful analysis of a given dataset. We do not assume that subjects in the current study and other relevant studies are drawn from the same population as assumed by meta-analysis. In particular, the set of parameters in the current study may be different from that of the other studies. We consider sample classification based on gene expression profiles in this context. We propose two new methods, a weighted partial least squares (WPLS) method and a weighted penalized partial least squares (WPPLS) method, to build a classifier by a combined use of multiple datasets. The methods can weight the individual datasets depending on their relevance to the current study. A more standard approach is first to build a classifier using each of the individual datasets, then to combine the outputs of the multiple classifiers using a weighted voting. Using two quite different datasets on human heart failure, we show first that WPLS/WPPLS, by borrowing information from the other dataset, can improve the performance of PLS/PPLS built on only a single dataset. Second, WPLS/WPPLS performs better than the standard approach of combining multiple classifiers. Third, WPPLS can improve over WPLS, just as PPLS does over PLS for a single dataset.
|Original language||English (US)|
|Number of pages||8|
|Journal||Computational Biology and Chemistry|
|State||Published - Jun 2005|
Bibliographical noteFunding Information:
X.H. and W.P. were partially supported by an NIH grant. J.H. was supported by an AHA grant, the Lillehei Heart Institute and the Minnesota Medical Foundation.
Copyright 2008 Elsevier B.V., All rights reserved.
- Gradient directed path
- Partial least squares
- Penalized partial least squares
- Squared error loss