Using input dependent weights for model combination and model selection with multiple sources of data

Wei Pan; Guanghua Xiao; Xiaohong Huang

Using input dependent weights for model combination and model selection with multiple sources of data

Wei Pan, Guanghua Xiao, Xiaohong Huang

Biostatistics

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

With various sources and large amounts of genomic and proteomic data accumulating, the importance of integrative analyses of multiple sources of data has been increasingly recognized. A natural approach is to combine multiple models, each built on one source of data. A challenge however is to account for different local information contents of different sources of data: the choice of the weight on each candidate model (and thus each source of data) may depend on the input for which a prediction is to be made, suggesting that the constant weights used in most existing approaches may not be optimal. Here we propose an input-dependent weighting (IDW) scheme with the weight being the probability of each model's giving a correct prediction for the given input. The weights can be estimated based on regression using training data. We apply IDW to discriminating human heart failure etiology using two sources of gene expression data, and to gene function prediction by a combined analysis of gene expression and protein-protein interaction data. It is demonstrated that IDW may perform better than some standard approaches. Input-dependent weights can be also adopted as a criterion for model selection.

Original language	English (US)
Pages (from-to)	523-540
Number of pages	18
Journal	Statistica Sinica
Volume	16
Issue number	2
State	Published - Apr 1 2006

Keywords

Classification
Microarray data
Model mixing
Partial least squares
Prediction

OpenUrl availability

Full text

Cite this

@article{d49174ade8e04c8f84369a06bd945340,

title = "Using input dependent weights for model combination and model selection with multiple sources of data",

abstract = "With various sources and large amounts of genomic and proteomic data accumulating, the importance of integrative analyses of multiple sources of data has been increasingly recognized. A natural approach is to combine multiple models, each built on one source of data. A challenge however is to account for different local information contents of different sources of data: the choice of the weight on each candidate model (and thus each source of data) may depend on the input for which a prediction is to be made, suggesting that the constant weights used in most existing approaches may not be optimal. Here we propose an input-dependent weighting (IDW) scheme with the weight being the probability of each model's giving a correct prediction for the given input. The weights can be estimated based on regression using training data. We apply IDW to discriminating human heart failure etiology using two sources of gene expression data, and to gene function prediction by a combined analysis of gene expression and protein-protein interaction data. It is demonstrated that IDW may perform better than some standard approaches. Input-dependent weights can be also adopted as a criterion for model selection.",

keywords = "Classification, Microarray data, Model mixing, Partial least squares, Prediction",

author = "Wei Pan and Guanghua Xiao and Xiaohong Huang",

year = "2006",

month = apr,

day = "1",

language = "English (US)",

volume = "16",

pages = "523--540",

journal = "Statistica Sinica",

issn = "1017-0405",

publisher = "Institute of Statistical Science",

number = "2",

}

TY - JOUR

T1 - Using input dependent weights for model combination and model selection with multiple sources of data

AU - Pan, Wei

AU - Xiao, Guanghua

AU - Huang, Xiaohong

PY - 2006/4/1

Y1 - 2006/4/1

N2 - With various sources and large amounts of genomic and proteomic data accumulating, the importance of integrative analyses of multiple sources of data has been increasingly recognized. A natural approach is to combine multiple models, each built on one source of data. A challenge however is to account for different local information contents of different sources of data: the choice of the weight on each candidate model (and thus each source of data) may depend on the input for which a prediction is to be made, suggesting that the constant weights used in most existing approaches may not be optimal. Here we propose an input-dependent weighting (IDW) scheme with the weight being the probability of each model's giving a correct prediction for the given input. The weights can be estimated based on regression using training data. We apply IDW to discriminating human heart failure etiology using two sources of gene expression data, and to gene function prediction by a combined analysis of gene expression and protein-protein interaction data. It is demonstrated that IDW may perform better than some standard approaches. Input-dependent weights can be also adopted as a criterion for model selection.

AB - With various sources and large amounts of genomic and proteomic data accumulating, the importance of integrative analyses of multiple sources of data has been increasingly recognized. A natural approach is to combine multiple models, each built on one source of data. A challenge however is to account for different local information contents of different sources of data: the choice of the weight on each candidate model (and thus each source of data) may depend on the input for which a prediction is to be made, suggesting that the constant weights used in most existing approaches may not be optimal. Here we propose an input-dependent weighting (IDW) scheme with the weight being the probability of each model's giving a correct prediction for the given input. The weights can be estimated based on regression using training data. We apply IDW to discriminating human heart failure etiology using two sources of gene expression data, and to gene function prediction by a combined analysis of gene expression and protein-protein interaction data. It is demonstrated that IDW may perform better than some standard approaches. Input-dependent weights can be also adopted as a criterion for model selection.

KW - Classification

KW - Microarray data

KW - Model mixing

KW - Partial least squares

KW - Prediction

UR - http://www.scopus.com/inward/record.url?scp=33746146485&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33746146485&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:33746146485

SN - 1017-0405

VL - 16

SP - 523

EP - 540

JO - Statistica Sinica

JF - Statistica Sinica

IS - 2

ER -

Using input dependent weights for model combination and model selection with multiple sources of data

Abstract

Keywords

OpenUrl availability

Other files and links

Fingerprint

Cite this