LIBRUS: Combined machine learning and homology information for sequence-based ligand-binding residue prediction

Chris Kauffman; George Karypis

doi:10.1093/bioinformatics/btp561

LIBRUS: Combined machine learning and homology information for sequence-based ligand-binding residue prediction

Chris Kauffman, George Karypis

Computer Science and Engineering

Research output: Contribution to journal › Article › peer-review

22 Scopus citations

Abstract

Motivation: Identifying residues that interact with ligands is useful as a first step to understanding protein function and as an aid to designing small molecules that target the protein for interaction. Several studies have shown that sequence features are very informative for this type of prediction, while structure features have also been useful when structure is available. We develop a sequence-based method, called LIBRUS, that combines homology-based transfer and direct prediction using machine learning and compare it to previous sequence-based work and current structure-based methods. Results: Our analysis shows that homology-based transfer is slightly more discriminating than a support vector machine learner using profiles and predicted secondary structure. We combine these two approaches in a method called LIBRUS. On a benchmark of 885 sequence-independent proteins, it achieves an area under the ROC curve (ROC) of 0.83 with 45% precision at 50% recall, a significant improvement over previous sequence-based efforts. On an independent benchmark set, a current method, FINDSITE, based on structure features achieves an ROC of 0.81 with 54% precision at 50% recall, while LIBRUS achieves an ROC of 0.82 with 39% precision at 50% recall at a smaller computational cost. When LIBRUS and FINDSITE predictions are combined, performance is increased beyond either reaching an ROC of 0.86 and 59% precision at 50% recall.

Original language	English (US)
Article number	btp561
Pages (from-to)	3099-3107
Number of pages	9
Journal	Bioinformatics
Volume	25
Issue number	23
DOIs	https://doi.org/10.1093/bioinformatics/btp561
State	Published - Sep 28 2009

Bibliographical note

Funding Information:
Funding: National Institute of Health (T32GM008347, RLM008713A); the National Science Foundation (IIS-0431135, IIS-0905220); the University of Minnesota Digital Technology Center.

Access

10.1093/bioinformatics/btp561

OpenUrl availability

Full text

Cite this

@article{c21619bf57204725b978db22156d2bd0,

title = "LIBRUS: Combined machine learning and homology information for sequence-based ligand-binding residue prediction",

abstract = "Motivation: Identifying residues that interact with ligands is useful as a first step to understanding protein function and as an aid to designing small molecules that target the protein for interaction. Several studies have shown that sequence features are very informative for this type of prediction, while structure features have also been useful when structure is available. We develop a sequence-based method, called LIBRUS, that combines homology-based transfer and direct prediction using machine learning and compare it to previous sequence-based work and current structure-based methods. Results: Our analysis shows that homology-based transfer is slightly more discriminating than a support vector machine learner using profiles and predicted secondary structure. We combine these two approaches in a method called LIBRUS. On a benchmark of 885 sequence-independent proteins, it achieves an area under the ROC curve (ROC) of 0.83 with 45% precision at 50% recall, a significant improvement over previous sequence-based efforts. On an independent benchmark set, a current method, FINDSITE, based on structure features achieves an ROC of 0.81 with 54% precision at 50% recall, while LIBRUS achieves an ROC of 0.82 with 39% precision at 50% recall at a smaller computational cost. When LIBRUS and FINDSITE predictions are combined, performance is increased beyond either reaching an ROC of 0.86 and 59% precision at 50% recall.",

author = "Chris Kauffman and George Karypis",

note = "Funding Information: Funding: National Institute of Health (T32GM008347, RLM008713A); the National Science Foundation (IIS-0431135, IIS-0905220); the University of Minnesota Digital Technology Center.",

year = "2009",

month = sep,

day = "28",

doi = "10.1093/bioinformatics/btp561",

language = "English (US)",

volume = "25",

pages = "3099--3107",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "23",

}

TY - JOUR

T1 - LIBRUS

T2 - Combined machine learning and homology information for sequence-based ligand-binding residue prediction

AU - Kauffman, Chris

AU - Karypis, George

N1 - Funding Information: Funding: National Institute of Health (T32GM008347, RLM008713A); the National Science Foundation (IIS-0431135, IIS-0905220); the University of Minnesota Digital Technology Center.

PY - 2009/9/28

Y1 - 2009/9/28

N2 - Motivation: Identifying residues that interact with ligands is useful as a first step to understanding protein function and as an aid to designing small molecules that target the protein for interaction. Several studies have shown that sequence features are very informative for this type of prediction, while structure features have also been useful when structure is available. We develop a sequence-based method, called LIBRUS, that combines homology-based transfer and direct prediction using machine learning and compare it to previous sequence-based work and current structure-based methods. Results: Our analysis shows that homology-based transfer is slightly more discriminating than a support vector machine learner using profiles and predicted secondary structure. We combine these two approaches in a method called LIBRUS. On a benchmark of 885 sequence-independent proteins, it achieves an area under the ROC curve (ROC) of 0.83 with 45% precision at 50% recall, a significant improvement over previous sequence-based efforts. On an independent benchmark set, a current method, FINDSITE, based on structure features achieves an ROC of 0.81 with 54% precision at 50% recall, while LIBRUS achieves an ROC of 0.82 with 39% precision at 50% recall at a smaller computational cost. When LIBRUS and FINDSITE predictions are combined, performance is increased beyond either reaching an ROC of 0.86 and 59% precision at 50% recall.

AB - Motivation: Identifying residues that interact with ligands is useful as a first step to understanding protein function and as an aid to designing small molecules that target the protein for interaction. Several studies have shown that sequence features are very informative for this type of prediction, while structure features have also been useful when structure is available. We develop a sequence-based method, called LIBRUS, that combines homology-based transfer and direct prediction using machine learning and compare it to previous sequence-based work and current structure-based methods. Results: Our analysis shows that homology-based transfer is slightly more discriminating than a support vector machine learner using profiles and predicted secondary structure. We combine these two approaches in a method called LIBRUS. On a benchmark of 885 sequence-independent proteins, it achieves an area under the ROC curve (ROC) of 0.83 with 45% precision at 50% recall, a significant improvement over previous sequence-based efforts. On an independent benchmark set, a current method, FINDSITE, based on structure features achieves an ROC of 0.81 with 54% precision at 50% recall, while LIBRUS achieves an ROC of 0.82 with 39% precision at 50% recall at a smaller computational cost. When LIBRUS and FINDSITE predictions are combined, performance is increased beyond either reaching an ROC of 0.86 and 59% precision at 50% recall.

UR - http://www.scopus.com/inward/record.url?scp=75949116562&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=75949116562&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btp561

DO - 10.1093/bioinformatics/btp561

M3 - Article

C2 - 19786483

AN - SCOPUS:75949116562

SN - 1367-4803

VL - 25

SP - 3099

EP - 3107

JO - Bioinformatics

JF - Bioinformatics

IS - 23

M1 - btp561

ER -

LIBRUS: Combined machine learning and homology information for sequence-based ligand-binding residue prediction

Abstract

Bibliographical note

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this