TY - JOUR
T1 - Multi-assay-based structure-activity relationship models
T2 - Improving structure-activity relationship models by incorporating activity information from related targets
AU - Ning, Xia
AU - Rangwala, Huzefa
AU - Karypis, George
PY - 2009/11/23
Y1 - 2009/11/23
N2 - Structure-activity relationship (SAR) models are used to inform and to guide the iterative optimization of chemical leads, and they play a fundamental role in modern drug discovery. In this paper, we present a new class of methods for building SAR models, referred to as multi-assay based, that utilize activity information from different targets. These methods first identify a set of targets that are related to the target under consideration, and then they employ various machine learning techniques that utilize activity information from these targets in order to build the desired SAR model. We developed different methods for identifying the set of related targets, which take into account the primary sequence of the targets or the structure of their ligands, and we also developed different machine learning techniques that were derived by using principles of semi-supervised learning, multi-task learning, and classifier ensembles. The comprehensive evaluation of these methods shows that they lead to considerable improvements over the standard SAR models that are based only on the ligands of the target under consideration. On a set of 117 protein targets, obtained from PubChem, these multi-assay-based methods achieve a receiver-operating characteristic score that is, on the average, 7.0 -7.2% higher than that achieved by the standard SAR models. Moreover, on a set of targets belonging to six protein families, the multi-assay-based methods outperform chemogenomicsbased approaches by 4.33%.
AB - Structure-activity relationship (SAR) models are used to inform and to guide the iterative optimization of chemical leads, and they play a fundamental role in modern drug discovery. In this paper, we present a new class of methods for building SAR models, referred to as multi-assay based, that utilize activity information from different targets. These methods first identify a set of targets that are related to the target under consideration, and then they employ various machine learning techniques that utilize activity information from these targets in order to build the desired SAR model. We developed different methods for identifying the set of related targets, which take into account the primary sequence of the targets or the structure of their ligands, and we also developed different machine learning techniques that were derived by using principles of semi-supervised learning, multi-task learning, and classifier ensembles. The comprehensive evaluation of these methods shows that they lead to considerable improvements over the standard SAR models that are based only on the ligands of the target under consideration. On a set of 117 protein targets, obtained from PubChem, these multi-assay-based methods achieve a receiver-operating characteristic score that is, on the average, 7.0 -7.2% higher than that achieved by the standard SAR models. Moreover, on a set of targets belonging to six protein families, the multi-assay-based methods outperform chemogenomicsbased approaches by 4.33%.
UR - http://www.scopus.com/inward/record.url?scp=72949114936&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=72949114936&partnerID=8YFLogxK
U2 - 10.1021/ci900182q
DO - 10.1021/ci900182q
M3 - Article
C2 - 19842624
AN - SCOPUS:72949114936
SN - 1549-9596
VL - 49
SP - 2444
EP - 2456
JO - Journal of Chemical Information and Modeling
JF - Journal of Chemical Information and Modeling
IS - 11
ER -