Motivation: Identifying residues that interact with ligands is useful as a first step to understanding protein function and as an aid to designing small molecules that target the protein for interaction. Several studies have shown that sequence features are very informative for this type of prediction, while structure features have also been useful when structure is available. We develop a sequence-based method, called LIBRUS, that combines homology-based transfer and direct prediction using machine learning and compare it to previous sequence-based work and current structure-based methods. Results: Our analysis shows that homology-based transfer is slightly more discriminating than a support vector machine learner using profiles and predicted secondary structure. We combine these two approaches in a method called LIBRUS. On a benchmark of 885 sequence-independent proteins, it achieves an area under the ROC curve (ROC) of 0.83 with 45% precision at 50% recall, a significant improvement over previous sequence-based efforts. On an independent benchmark set, a current method, FINDSITE, based on structure features achieves an ROC of 0.81 with 54% precision at 50% recall, while LIBRUS achieves an ROC of 0.82 with 39% precision at 50% recall at a smaller computational cost. When LIBRUS and FINDSITE predictions are combined, performance is increased beyond either reaching an ROC of 0.86 and 59% precision at 50% recall.
Bibliographical noteFunding Information:
Funding: National Institute of Health (T32GM008347, RLM008713A); the National Science Foundation (IIS-0431135, IIS-0905220); the University of Minnesota Digital Technology Center.