Incremental window-based protein sequence alignment algorithms

Huzefa Rangwala; George Karypis

doi:10.1093/bioinformatics/btl297

Incremental window-based protein sequence alignment algorithms

Huzefa Rangwala, George Karypis

Computer Science and Engineering

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

Motivation: Protein sequence alignment plays a critical role in computational biology as it is an integral part in many analysis tasks designed to solve problems in comparative genomics, structure and function prediction, and homology modeling. Methods: We have developed novel sequence alignment algorithms that compute the alignment between a pair of sequences based on short fixed- or variable-length high-scoring subsequences. Our algorithms build the alignments by repeatedly selecting the highest scoring pairs of subsequences and using them to construct small portions of the final alignment. We utilize PSI-BLAST generated sequence profiles and employ a profile-to-profile scoring scheme derived from PICASSO. Results: We evaluated the performance of the computed alignments on two recently published benchmark datasets and compared them against the alignments computed by existing state-of-the-art dynamic programming-based profile-to-profile local and global sequence alignment algorithms. Our results show that the new algorithms achieve alignments that are comparable with or better than those achieved by existing algorithms. Moreover, our results also showed that these algorithms can be used to provide better information as to which of the aligned positions are more reliable - a critical piece of information for comparative modeling applications.

Original language	English (US)
Pages (from-to)	e17-e23
Journal	Bioinformatics
Volume	23
Issue number	2
DOIs	https://doi.org/10.1093/bioinformatics/btl297
State	Published - 2007

Bibliographical note

Funding Information:
We would like to express our deepest thanks to Professor Arne Elofsson and Dr Robert C. Edgar for providing us their datasets and codes for the study. This work was supported by NSF EIA-9986042, ACI-0133464, IIS-0431135, NIH RLM008713A, the Army High Performance Computing Research Center contract number DAAD19-01-2-0014 and by the Digital Technology Center at the University of Minnesota.

Access

10.1093/bioinformatics/btl297

OpenUrl availability

Full text

Cite this

@article{e1effe49ae154fb18259d7b3d4e37bb2,

title = "Incremental window-based protein sequence alignment algorithms",

abstract = "Motivation: Protein sequence alignment plays a critical role in computational biology as it is an integral part in many analysis tasks designed to solve problems in comparative genomics, structure and function prediction, and homology modeling. Methods: We have developed novel sequence alignment algorithms that compute the alignment between a pair of sequences based on short fixed- or variable-length high-scoring subsequences. Our algorithms build the alignments by repeatedly selecting the highest scoring pairs of subsequences and using them to construct small portions of the final alignment. We utilize PSI-BLAST generated sequence profiles and employ a profile-to-profile scoring scheme derived from PICASSO. Results: We evaluated the performance of the computed alignments on two recently published benchmark datasets and compared them against the alignments computed by existing state-of-the-art dynamic programming-based profile-to-profile local and global sequence alignment algorithms. Our results show that the new algorithms achieve alignments that are comparable with or better than those achieved by existing algorithms. Moreover, our results also showed that these algorithms can be used to provide better information as to which of the aligned positions are more reliable - a critical piece of information for comparative modeling applications.",

author = "Huzefa Rangwala and George Karypis",

note = "Funding Information: We would like to express our deepest thanks to Professor Arne Elofsson and Dr Robert C. Edgar for providing us their datasets and codes for the study. This work was supported by NSF EIA-9986042, ACI-0133464, IIS-0431135, NIH RLM008713A, the Army High Performance Computing Research Center contract number DAAD19-01-2-0014 and by the Digital Technology Center at the University of Minnesota.",

year = "2007",

doi = "10.1093/bioinformatics/btl297",

language = "English (US)",

volume = "23",

pages = "e17--e23",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "2",

}

TY - JOUR

T1 - Incremental window-based protein sequence alignment algorithms

AU - Rangwala, Huzefa

AU - Karypis, George

N1 - Funding Information: We would like to express our deepest thanks to Professor Arne Elofsson and Dr Robert C. Edgar for providing us their datasets and codes for the study. This work was supported by NSF EIA-9986042, ACI-0133464, IIS-0431135, NIH RLM008713A, the Army High Performance Computing Research Center contract number DAAD19-01-2-0014 and by the Digital Technology Center at the University of Minnesota.

PY - 2007

Y1 - 2007

N2 - Motivation: Protein sequence alignment plays a critical role in computational biology as it is an integral part in many analysis tasks designed to solve problems in comparative genomics, structure and function prediction, and homology modeling. Methods: We have developed novel sequence alignment algorithms that compute the alignment between a pair of sequences based on short fixed- or variable-length high-scoring subsequences. Our algorithms build the alignments by repeatedly selecting the highest scoring pairs of subsequences and using them to construct small portions of the final alignment. We utilize PSI-BLAST generated sequence profiles and employ a profile-to-profile scoring scheme derived from PICASSO. Results: We evaluated the performance of the computed alignments on two recently published benchmark datasets and compared them against the alignments computed by existing state-of-the-art dynamic programming-based profile-to-profile local and global sequence alignment algorithms. Our results show that the new algorithms achieve alignments that are comparable with or better than those achieved by existing algorithms. Moreover, our results also showed that these algorithms can be used to provide better information as to which of the aligned positions are more reliable - a critical piece of information for comparative modeling applications.

AB - Motivation: Protein sequence alignment plays a critical role in computational biology as it is an integral part in many analysis tasks designed to solve problems in comparative genomics, structure and function prediction, and homology modeling. Methods: We have developed novel sequence alignment algorithms that compute the alignment between a pair of sequences based on short fixed- or variable-length high-scoring subsequences. Our algorithms build the alignments by repeatedly selecting the highest scoring pairs of subsequences and using them to construct small portions of the final alignment. We utilize PSI-BLAST generated sequence profiles and employ a profile-to-profile scoring scheme derived from PICASSO. Results: We evaluated the performance of the computed alignments on two recently published benchmark datasets and compared them against the alignments computed by existing state-of-the-art dynamic programming-based profile-to-profile local and global sequence alignment algorithms. Our results show that the new algorithms achieve alignments that are comparable with or better than those achieved by existing algorithms. Moreover, our results also showed that these algorithms can be used to provide better information as to which of the aligned positions are more reliable - a critical piece of information for comparative modeling applications.

UR - http://www.scopus.com/inward/record.url?scp=33846705174&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33846705174&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btl297

DO - 10.1093/bioinformatics/btl297

M3 - Article

C2 - 17237087

AN - SCOPUS:33846705174

SN - 1367-4803

VL - 23

SP - e17-e23

JO - Bioinformatics

JF - Bioinformatics

IS - 2

ER -

Incremental window-based protein sequence alignment algorithms

Abstract

Bibliographical note

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this